Generative vs. Discriminative AI
submitted by /u/Neurosymbolic
[link] [comments]
( 9
min )
AI Weirdness: the strange side of machine learning
( 2
min )
In recent years, significant progress in generative AI has highlighted the
important role of physics-inspired models that utilize advanced mathematical
concepts based on fundamental physics principles to enhance artificial
intelligence capabilities. Among these models, those based on diffusion
equations have greatly improved image quality. This study aims to explore the
potential uses of Maxwell-Boltzmann equation, which forms the basis of the
kinetic theory of gases, and the Michaelis-Menten model in Marketing Mix
Modelling (MMM) applications. We propose incorporating these equations into
Hierarchical Bayesian models to analyse consumer behaviour in the context of
advertising. These equation sets excel in accurately describing the random
dynamics in complex systems like social interactions and consumer-advertising
interactions.
( 2
min )
Adjoint operators have been found to be effective in the exploration of CNN's
inner workings [1]. However, the previous no-bias assumption restricted its
generalization. We overcome the restriction via embedding input images into an
extended normed space that includes bias in all CNN layers as part of the
extended space and propose an adjoint-operator-based algorithm that maps
high-level weights back to the extended input space for reconstructing an
effective hypersurface. Such hypersurface can be computed for an arbitrary unit
in the CNN, and we prove that this reconstructed hypersurface, when multiplied
by the original input (through an inner product), will precisely replicate the
output value of each unit. We show experimental results based on the CIFAR-10
and CIFAR-100 data sets where the proposed approach achieves near 0 activation
value reconstruction error.
( 2
min )
We consider the straggler problem in decentralized learning over a logical
ring while preserving user data privacy. Especially, we extend the recently
proposed framework of differential privacy (DP) amplification by
decentralization by Cyffers and Bellet to include overall training
latency--comprising both computation and communication latency. Analytical
results on both the convergence speed and the DP level are derived for both a
skipping scheme (which ignores the stragglers after a timeout) and a baseline
scheme that waits for each node to finish before the training continues. A
trade-off between overall training latency, accuracy, and privacy,
parameterized by the timeout of the skipping scheme, is identified and
empirically validated for logistic regression on a real-world dataset and for
image classification using the MNIST and CIFAR-10 datasets.
( 2
min )
This paper integrates manifold learning techniques within a \emph{Gaussian
process upper confidence bound} algorithm to optimize an objective function on
a manifold. Our approach is motivated by applications where a full
representation of the manifold is not available and querying the objective is
expensive. We rely on a point cloud of manifold samples to define a graph
Gaussian process surrogate model for the objective. Query points are
sequentially chosen using the posterior distribution of the surrogate model
given all previous queries. We establish regret bounds in terms of the number
of queries and the size of the point cloud. Several numerical examples
complement the theory and illustrate the performance of our method.
( 2
min )
For a widely-studied data model and general loss and sample-hardening
functions we prove that the Supervised Contrastive Learning (SCL), Hard-SCL
(HSCL), and Unsupervised Contrastive Learning (UCL) risks are minimized by
representations that exhibit Neural Collapse (NC), i.e., the class means form
an Equianglular Tight Frame (ETF) and data from the same class are mapped to
the same representation. We also prove that for any representation mapping, the
HSCL and Hard-UCL (HUCL) risks are lower bounded by the corresponding SCL and
UCL risks. Although the optimality of ETF is known for SCL, albeit only for
InfoNCE loss, its optimality for HSCL and UCL under general loss and hardening
functions is novel. Moreover, our proofs are much simpler, compact, and
transparent. We empirically demonstrate, for the first time, that ADAM
optimization of HSCL and HUCL risks with random initialization and suitable
hardness levels can indeed converge to the NC geometry if we incorporate
unit-ball or unit-sphere feature normalization. Without incorporating hard
negatives or feature normalization, however, the representations learned via
ADAM suffer from dimensional collapse (DC) and fail to attain the NC geometry.
( 2
min )
Federated Learning is expected to provide strong privacy guarantees, as only
gradients or model parameters but no plain text training data is ever exchanged
either between the clients or between the clients and the central server. In
this paper, we challenge this claim by introducing a simple but still very
effective membership inference attack algorithm, which relies only on a single
training step. In contrast to the popular honest-but-curious model, we
investigate a framework with a dishonest central server. Our strategy is
applicable to models with ReLU activations and uses the properties of this
activation function to achieve perfect accuracy. Empirical evaluation on visual
classification tasks with MNIST, CIFAR10, CIFAR100 and CelebA datasets show
that our method provides perfect accuracy in identifying one sample in a
training set with thousands of samples. Occasional failures of our method lead
us to discover duplicate images in the CIFAR100 and CelebA datasets.
( 2
min )
In data-driven systems, data exploration is imperative for making real-time
decisions. However, big data is stored in massive databases that are difficult
to retrieve. Approximate Query Processing (AQP) is a technique for providing
approximate answers to aggregate queries based on a summary of the data
(synopsis) that closely replicates the behavior of the actual data, which can
be useful where an approximate answer to the queries would be acceptable in a
fraction of the real execution time. This study explores the novel utilization
of Generative Adversarial Networks (GANs) in the generation of tabular data
that can be employed in AQP for synopsis construction. We thoroughly
investigate the unique challenges posed by the synopsis construction process,
including maintaining data distribution characteristics, handling bounded
continuous and categorical data, and preserving semantic relationships and then
introduce the advancement of tabular GAN architectures that overcome these
challenges. Furthermore, we propose and validate a suite of statistical metrics
tailored for assessing the reliability of the GAN-generated synopses. Our
findings demonstrate that advanced GAN variations exhibit a promising capacity
to generate high-fidelity synopses, potentially transforming the efficiency and
effectiveness of AQP in data-driven systems.
( 2
min )
Self-supervised learning (SSL) for WiFi-based human activity recognition
(HAR) holds great promise due to its ability to address the challenge of
insufficient labeled data. However, directly transplanting SSL algorithms,
especially contrastive learning, originally designed for other domains to CSI
data, often fails to achieve the expected performance. We attribute this issue
to the inappropriate alignment criteria, which disrupt the semantic distance
consistency between the feature space and the input space. To address this
challenge, we introduce \textbf{A}ntenna \textbf{R}esponse \textbf{C}onsistency
(ARC) as a solution to define proper alignment criteria. ARC is designed to
retain semantic information from the input space while introducing robustness
to real-world noise. Moreover, we substantiate the effectiveness of ARC through
a comprehensive set of experiments, demonstrating its capability to enhance the
performance of self-supervised learning for WiFi-based HAR by achieving an
increase of over 5\% in accuracy in most cases and achieving a best accuracy of
94.97\%.
( 2
min )
With the development of trustworthy Federated Learning (FL), the requirement
of implementing right to be forgotten gives rise to the area of Federated
Unlearning (FU). Comparing to machine unlearning, a major challenge of FU lies
in the decentralized and privacy-preserving nature of FL, in which clients
jointly train a global model without sharing their raw data, making it
substantially more intricate to selectively unlearn specific information. In
that regard, many efforts have been made to tackle the challenges of FU and
have achieved significant progress. In this paper, we present a comprehensive
survey of FU. Specially, we provide the existing algorithms, objectives,
evaluation metrics, and identify some challenges of FU. By reviewing and
comparing some studies, we summarize them into a taxonomy for various schemes,
potential applications and future directions.
( 2
min )
Open-set recognition (OSR), the identification of novel categories, can be a
critical component when deploying classification models in real-world
applications. Recent work has shown that familiarity-based scoring rules such
as the Maximum Softmax Probability (MSP) or the Maximum Logit Score (MLS) are
strong baselines when the closed-set accuracy is high. However, one of the
potential weaknesses of familiarity-based OSR are adversarial attacks. Here, we
present gradient-based adversarial attacks on familiarity scores for both types
of attacks, False Familiarity and False Novelty attacks, and evaluate their
effectiveness in informed and uninformed settings on TinyImageNet.
( 2
min )
We prove an upper bound on the covering number of real algebraic varieties,
images of polynomial maps and semialgebraic sets. The bound remarkably improves
the best known bound by Yomdin-Comte, and its proof is much more
straightforward. As a consequence, our result gives a bound on volume of the
tubular neighborhood of a real variety, improving the results by Lotz and
Basu-Lerario. We apply our theory to three main application domains. Firstly,
we derive a near-optimal bound on the covering number of low rank CP tensors.
Secondly, we prove a bound on the sketching dimension for (general) polynomial
optimization problems. Lastly, we deduce generalization error bounds for deep
neural networks with rational or ReLU activations, improving or matching the
best known results in the literature.
( 2
min )
In this work, we present Transformer-based Powered Descent Guidance (T-PDG),
a scalable algorithm for reducing the computational complexity of the direct
optimization formulation of the spacecraft powered descent guidance problem.
T-PDG uses data from prior runs of trajectory optimization algorithms to train
a transformer neural network, which accurately predicts the relationship
between problem parameters and the globally optimal solution for the powered
descent guidance problem. The solution is encoded as the set of tight
constraints corresponding to the constrained minimum-cost trajectory and the
optimal final time of landing. By leveraging the attention mechanism of
transformer neural networks, large sequences of time series data can be
accurately predicted when given only the spacecraft state and landing site
parameters. When applied to the real problem of Mars powered descent guidance,
T-PDG reduces the time for computing the 3 degree of freedom fuel-optimal
trajectory, when compared to lossless convexification, from an order of 1-8
seconds to less than 500 milliseconds. A safe and optimal solution is
guaranteed by including a feasibility check in T-PDG before returning the final
trajectory.
( 2
min )
In this paper, we address the limitations of the common data annotation and
training methods for objective single-label classification tasks. Typically,
when annotating such tasks annotators are only asked to provide a single label
for each sample and annotator disagreement is discarded when a final hard label
is decided through majority voting. We challenge this traditional approach,
acknowledging that determining the appropriate label can be difficult due to
the ambiguity and lack of context in the data samples. Rather than discarding
the information from such ambiguous annotations, our soft label method makes
use of them for training. Our findings indicate that additional annotator
information, such as confidence, secondary label and disagreement, can be used
to effectively generate soft labels. Training classifiers with these soft
labels then leads to improved performance and calibration on the hard label
test set.
( 2
min )
The growing use of digital communication platforms has given rise to various
criminal activities, such as grooming and drug dealing, which pose significant
challenges to law enforcement and forensic experts. This paper presents a
supervised keyphrase extraction approach to detect relevant information in
high-volume chat logs involving grooming and drug dealing for forensic
analysis. The proposed method, JointKPE++, builds upon the JointKPE keyphrase
extractor by employing improvements to handle longer texts effectively. We
evaluate JointKPE++ using BERT-based pre-trained models on grooming and drug
dealing datasets, including BERT, RoBERTa, SpanBERT, and BERTimbau. The results
show significant improvements over traditional approaches and demonstrate the
potential for JointKPE++ to aid forensic experts in efficiently detecting
keyphrases related to criminal activities.
( 2
min )
We consider an unknown multivariate function representing a system-such as a
complex numerical simulator-taking both deterministic and uncertain inputs. Our
objective is to estimate the set of deterministic inputs leading to outputs
whose probability (with respect to the distribution of the uncertain inputs) of
belonging to a given set is less than a given threshold. This problem, which we
call Quantile Set Inversion (QSI), occurs for instance in the context of robust
(reliability-based) optimization problems, when looking for the set of
solutions that satisfy the constraints with sufficiently large probability. To
solve the QSI problem, we propose a Bayesian strategy based on Gaussian process
modeling and the Stepwise Uncertainty Reduction (SUR) principle, to
sequentially choose the points at which the function should be evaluated to
efficiently approximate the set of interest. We illustrate the performance and
interest of the proposed SUR strategy through several numerical experiments.
( 2
min )
Generalized self-concordance is a key property present in the objective
function of many important learning problems. We establish the convergence rate
of a simple Frank-Wolfe variant that uses the open-loop step size strategy
$\gamma_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this
class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the
iteration count. This avoids the use of second-order information or the need to
estimate local smoothness parameters of previous work. We also show improved
convergence rates for various common cases, e.g., when the feasible region
under consideration is uniformly convex or polyhedral.
( 2
min )
This paper integrates manifold learning techniques within a \emph{Gaussian
process upper confidence bound} algorithm to optimize an objective function on
a manifold. Our approach is motivated by applications where a full
representation of the manifold is not available and querying the objective is
expensive. We rely on a point cloud of manifold samples to define a graph
Gaussian process surrogate model for the objective. Query points are
sequentially chosen using the posterior distribution of the surrogate model
given all previous queries. We establish regret bounds in terms of the number
of queries and the size of the point cloud. Several numerical examples
complement the theory and illustrate the performance of our method.
( 2
min )
The aim of this study is to define importance of predictors for black box
machine learning methods, where the prediction function can be complex and
cannot be represented by statistical parameters. In this paper we defined a
``Generalized Variable Importance Metric (GVIM)'' using the true conditional
expectation function for a continuous or a binary response variable. We further
showed that the defined GVIM can be represented as a function of the
Conditional Average Treatment Effect (CATE) for multinomial and continuous
predictors. Then we propose how the metric can be estimated using using any
machine learning models. Finally using simulations we evaluated the properties
of the estimator when estimated from XGBoost, Random Forest and a mis-specified
generalized additive model.
( 2
min )
When systems use data-based models that are based on machine learning (ML),
errors in their results cannot be ruled out. This is particularly critical if
it remains unclear to the user how these models arrived at their decisions and
if errors can have safety-relevant consequences, as is often the case in the
medical field. In such cases, the use of dependable methods to quantify the
uncertainty remaining in a result allows the user to make an informed decision
about further usage and draw possible conclusions based on a given result. This
paper demonstrates the applicability and practical utility of the Uncertainty
Wrapper using flow cytometry as an application from the medical field that can
benefit from the use of ML models in conjunction with dependable and
transparent uncertainty quantification.
( 2
min )
In the recent past, using machine learning (ML) to make predictions, especially for data in the form of text and images, required extensive ML knowledge for creating and tuning of deep learning models. Today, ML has become more accessible to any user who wants to use ML models to generate business value. With Amazon SageMaker […]
( 7
min )
Creating high-performance machine learning (ML) solutions relies on exploring and optimizing training parameters, also known as hyperparameters. Hyperparameters are the knobs and levers that we use to adjust the training process, such as learning rate, batch size, regularization strength, and others, depending on the specific model and task at hand. Exploring hyperparameters involves systematically varying […]
( 20
min )
Thanks to a viral trend sweeping social media, we now know some men think about the Roman Empire every day. And thanks to Luke Farritor, a 21-year-old computer science undergrad at the University of Nebraska-Lincoln, and like-minded AI enthusiasts, there might soon be a lot more to think about. Blending a passion for history with Read article >
( 6
min )
Fetal brain MRI is becoming an increasingly relevant complement to
neurosonography for perinatal diagnosis, allowing fundamental insights into
fetal brain development throughout gestation. However, uncontrolled fetal
motion and heterogeneity in acquisition protocols lead to data of variable
quality, potentially biasing the outcome of subsequent studies. We present
FetMRQC, an open-source machine-learning framework for automated image quality
assessment and quality control that is robust to domain shifts induced by the
heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics
from unprocessed anatomical MRI and combines them to predict experts' ratings
using random forests. We validate our framework on a pioneeringly large and
diverse dataset of more than 1600 manually rated fetal brain T2-weighted images
from four clinical centers and 13 different scanners. Our study shows that
FetMRQC's predictions generalize well to unseen data while being interpretable.
FetMRQC is a step towards more robust fetal brain neuroimaging, which has the
potential to shed new insights on the developing human brain.
( 3
min )
Deep learning has taken by storm all fields involved in data analysis,
including remote sensing for Earth observation. However, despite significant
advances in terms of performance, its lack of explainability and
interpretability, inherent to neural networks in general since their inception,
remains a major source of criticism. Hence it comes as no surprise that the
expansion of deep learning methods in remote sensing is being accompanied by
increasingly intensive efforts oriented towards addressing this drawback
through the exploration of a wide spectrum of Explainable Artificial
Intelligence techniques. This chapter, organized according to prominent Earth
observation application fields, presents a panorama of the state-of-the-art in
explainable remote sensing image analysis.
( 2
min )
We introduce the text-to-instrument task, which aims at generating
sample-based musical instruments based on textual prompts. Accordingly, we
propose InstrumentGen, a model that extends a text-prompted generative audio
framework to condition on instrument family, source type, pitch (across an
88-key spectrum), velocity, and a joint text/audio embedding. Furthermore, we
present a differentiable loss function to evaluate the intra-instrument timbral
consistency of sample-based instruments. Our results establish a foundational
text-to-instrument baseline, extending research in the domain of automatic
sample-based instrument generation.
( 2
min )
Recent AI research has significantly reduced the barriers to apply AI, but
the process of setting up the necessary tools and frameworks can still be a
challenge. While AI-as-a-Service platforms have emerged to simplify the
training and deployment of AI models, they still fall short of achieving true
democratization of AI. In this paper, we aim to address this gap by comparing
several popular AI-as-a-Service platforms and identifying the key requirements
for a platform that can achieve true democratization of AI. Our analysis
highlights the need for self-hosting options, high scalability, and openness.
To address these requirements, we propose our approach: the "Open Space for
Machine Learning" platform. Our platform is built on cutting-edge technologies
such as Kubernetes, Kubeflow Pipelines, and Ludwig, enabling us to overcome the
challenges of democratizing AI. We argue that our approach is more
comprehensive and effective in meeting the requirements of democratizing AI
than existing AI-as-a-Service platforms.
( 2
min )
The electrocardiogram (ECG) is a dependable instrument for assessing the
function of the cardiovascular system. There has recently been much emphasis on
precisely classifying ECGs. While ECG situations have numerous similarities,
little attention has been paid to categorizing ECGs using graph neural
networks. In this study, we offer three distinct techniques for classifying
heartbeats using deep graph neural networks to classify the ECG signals
accurately. We suggest using different methods to extract topological features
from the ECG signal and then using a branch of the graph neural network named
graph isomorphism network for classifying the ECGs. On the PTB Diagnostics data
set, we tested the three proposed techniques. According to the findings, the
three proposed techniques are capable of making arrhythmia classification
predictions with the accuracy of 99.38, 98.76, and 91.93 percent, respectively.
( 2
min )
This study presents an innovative method for predicting the market value of
professional soccer players using explainable machine learning models. Using a
dataset curated from the FIFA website, we employ an ensemble machine learning
approach coupled with Shapley Additive exPlanations (SHAP) to provide detailed
explanations of the models' predictions. The GBDT model achieves the highest
mean R-Squared (0.8780) and the lowest mean Root Mean Squared Error
(3,221,632.175), indicating its superior performance among the evaluated
models. Our analysis reveals that specific skills such as ball control, short
passing, finishing, interceptions, dribbling, and tackling are paramount within
the skill dimension, whereas sprint speed and acceleration are critical in the
fitness dimension, and reactions are preeminent in the cognitive dimension. Our
results offer a more accurate, objective, and consistent framework for market
value estimation, presenting useful insights for managerial decisions in player
transfers.
( 2
min )
Black-box variational inference performance is sometimes hindered by the use
of gradient estimators with high variance. This variance comes from two sources
of randomness: Data subsampling and Monte Carlo sampling. While existing
control variates only address Monte Carlo noise, and incremental gradient
methods typically only address data subsampling, we propose a new "joint"
control variate that jointly reduces variance from both sources of noise. This
significantly reduces gradient variance, leading to faster optimization in
several applications.
( 2
min )
Contrastive learning has recently emerged as a promising approach for
learning data representations that discover and disentangle the explanatory
factors of the data. Previous analyses of such approaches have largely focused
on individual contrastive losses, such as noise-contrastive estimation (NCE)
and InfoNCE, and rely on specific assumptions about the data generating
process. This paper extends the theoretical guarantees for disentanglement to a
broader family of contrastive methods, while also relaxing the assumptions
about the data distribution. Specifically, we prove identifiability of the true
latents for four contrastive losses studied in this paper, without imposing
common independence assumptions. The theoretical findings are validated on
several benchmark datasets. Finally, practical limitations of these methods are
also investigated.
( 2
min )
In this paper, we develop data-dependent and algorithm-dependent
generalization bounds for transductive learning algorithms in the context of
information theory for the first time. We show that the generalization gap of
transductive learning algorithms can be bounded by the mutual information
between training labels and hypothesis. By innovatively proposing the concept
of transductive supersamples, we go beyond the inductive learning setting and
establish upper bounds in terms of various information measures. Furthermore,
we derive novel PAC-Bayesian bounds and build the connection between
generalization and loss landscape flatness under the transductive learning
setting. Finally, we present the upper bounds for adaptive optimization
algorithms and demonstrate the applications of results on semi-supervised
learning and graph learning scenarios. Our theoretic results are validated on
both synthetic and real-world datasets.
( 2
min )
The rising popularity of artificial intelligence in healthcare is
highlighting the problem that a computational model achieving super-human
clinical performance at its training sites may perform substantially worse at
new sites. In this perspective, we present common sources for this failure to
transport, which we divide into sources under the control of the experimenter
and sources inherent to the clinical data-generating process. Of the inherent
sources we look a little deeper into site-specific clinical practices that can
affect the data distribution, and propose a potential solution intended to
isolate the imprint of those practices on the data from the patterns of disease
cause and effect that are the usual target of clinical models.
( 2
min )
In this paper, we present new high-probability PAC-Bayes bounds for different
types of losses. Firstly, for losses with a bounded range, we recover a
strengthened version of Catoni's bound that holds uniformly for all parameter
values. This leads to new fast rate and mixed rate bounds that are
interpretable and tighter than previous bounds in the literature. In
particular, the fast rate bound is equivalent to the Seeger--Langford bound.
Secondly, for losses with more general tail behaviors, we introduce two new
parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative
generating function is bounded, and a bound when the loss' second moment is
bounded. These two bounds are obtained using a new technique based on a
discretization of the space of possible events for the "in probability"
parameter optimization problem. This technique is both simpler and more general
than previous approaches optimizing over a grid on the parameters' space.
Finally, we extend all previous results to anytime-valid bounds using a simple
technique applicable to any existing bound.
( 2
min )
Neural networks have shown remarkable performance in computer vision, but
their deployment in numerous scientific and technical fields is challenging due
to their black-box nature. Scientists and practitioners need to evaluate the
reliability of a decision, i.e., to know simultaneously if a model relies on
the relevant features and whether these features are robust to image
corruptions. Existing attribution methods aim to provide human-understandable
explanations by highlighting important regions in the image domain, but fail to
fully characterize a decision process's reliability. To bridge this gap, we
introduce the Wavelet sCale Attribution Method (WCAM), a generalization of
attribution from the pixel domain to the space-scale domain using wavelet
transforms. Attribution in the wavelet domain reveals where and on what scales
the model focuses, thus enabling us to assess whether a decision is reliable.
Our code is accessible here:
\url{https://github.com/gabrielkasmi/spectral-attribution}.
( 2
min )
Good data stewardship requires removal of data at the request of the data's
owner. This raises the question if and how a trained machine-learning model,
which implicitly stores information about its training data, should be affected
by such a removal request. Is it possible to "remove" data from a
machine-learning model? We study this problem by defining certified removal: a
very strong theoretical guarantee that a model from which data is removed
cannot be distinguished from a model that never observed the data to begin
with. We develop a certified-removal mechanism for linear classifiers and
empirically study learning settings in which this mechanism is practical.
( 2
min )
In this paper, we propose to develop a new Cram\'er-Rao Bound (CRB) when the
parameter to estimate lies in a manifold and follows a prior distribution. This
derivation leads to a natural inequality between an error criteria based on
geometrical properties and this new bound. This main contribution is
illustrated in the problem of covariance estimation when the data follow a
Gaussian distribution and the prior distribution is an inverse Wishart.
Numerical simulation shows new results where the proposed CRB allows to exhibit
interesting properties of the MAP estimator which are not observed with the
classical Bayesian CRB.
( 2
min )
This paper establishes the nearly optimal rate of approximation for deep
neural networks (DNNs) when applied to Korobov functions, effectively
overcoming the curse of dimensionality. The approximation results presented in
this paper are measured with respect to $L_p$ norms and $H^1$ norms. Our
achieved approximation rate demonstrates a remarkable "super-convergence" rate,
outperforming traditional methods and any continuous function approximator.
These results are non-asymptotic, providing error bounds that consider both the
width and depth of the networks simultaneously.
( 2
min )
Happ and Greven (2018) developed a methodology for principal components
analysis of multivariate functional data for data observed on different
dimensional domains. Their approach relies on an estimation of univariate
functional principal components for each univariate functional feature. In this
paper, we present extensive simulations to investigate choosing the number of
principal components to retain. We show empirically that the conventional
approach of using a percentage of variance explained threshold for each
univariate functional feature may be unreliable when aiming to explain an
overall percentage of variance in the multivariate functional data, and thus we
advise practitioners to be careful when using it.
( 2
min )
Building out a machine learning operations (MLOps) platform in the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) for organizations is essential for seamlessly bridging the gap between data science experimentation and deployment while meeting the requirements around model performance, security, and compliance. In order to fulfill regulatory and compliance requirements, the […]
( 17
min )
Generative AI models for coding companions are mostly trained on publicly available source code and natural language text. While the large size of the training corpus enables the models to generate code for commonly used functionality, these models are unaware of code in private repositories and the associated coding styles that are enforced when developing […]
( 11
min )
Wield the blade and embrace the way of the samurai for some thrilling action — Onimusha: Warlords comes to GeForce NOW this week. Members can experience feudal Japan in this hack-and-slash adventure game in the cloud. It’s part of an action-packed GFN Thursday, with 16 more games joining the cloud gaming platform’s library. Forging Destinies Read article >
( 5
min )
Working together to create open-source and private datasets for AI training.
( 2
min )
It is commonly recognized that the expressiveness of deep neural networks is
contingent upon a range of factors, encompassing their depth, width, and other
relevant considerations. Currently, the practical performance of the majority
of deep neural networks remains uncertain. For ReLU (Rectified Linear Unit)
networks with piecewise linear activations, the number of linear convex regions
serves as a natural metric to gauge the network's expressivity. In this paper,
we count the number of linear convex regions in deep neural networks based on
ReLU. In particular, we prove that for any one-dimensional input, there exists
a minimum threshold for the number of neurons required to express it. We also
empirically observe that for the same network, intricate inputs hinder its
capacity to express linear regions. Furthermore, we unveil the iterative
refinement process of decision boundaries in ReLU networks during training. We
aspire for our research to serve as an inspiration for network optimization
endeavors and aids in the exploration and analysis of the behaviors exhibited
by deep networks.
( 2
min )
Minimum Description Length (MDL) estimators, using two-part codes for
universal coding, are analyzed. For general parametric families under certain
regularity conditions, we introduce a two-part code whose regret is close to
the minimax regret, where regret of a code with respect to a target family M is
the difference between the code length of the code and the ideal code length
achieved by an element in M. This is a generalization of the result for
exponential families by Gr\"unwald. Our code is constructed by using an
augmented structure of M with a bundle of local exponential families for data
description, which is not needed for exponential families. This result gives a
tight upper bound on risk and loss of the MDL estimators based on the theory
introduced by Barron and Cover in 1991. Further, we show that we can apply the
result to mixture families, which are a typical example of non-exponential
families.
( 2
min )
The diffusion model has shown remarkable success in computer vision, but it
remains unclear whether the ODE-based probability flow or the SDE-based
diffusion model is more superior and under what circumstances. Comparing the
two is challenging due to dependencies on data distributions, score training,
and other numerical issues. In this paper, we study the problem mathematically
for two limiting scenarios: the zero diffusion (ODE) case and the large
diffusion case. We first introduce a pulse-shape error to perturb the score
function and analyze error accumulation of sampling quality, followed by a
thorough analysis for generalization to arbitrary error. Our findings indicate
that when the perturbation occurs at the end of the generative process, the ODE
model outperforms the SDE model with a large diffusion coefficient. However,
when the perturbation occurs earlier, the SDE model outperforms the ODE model,
and we demonstrate that the error of sample generation due to such a
pulse-shape perturbation is exponentially suppressed as the diffusion term's
magnitude increases to infinity. Numerical validation of this phenomenon is
provided using Gaussian, Gaussian mixture, and Swiss roll distribution, as well
as realistic datasets like MNIST and CIFAR-10.
( 2
min )
Accurate detection of human presence in indoor environments is important for
various applications, such as energy management and security. In this paper, we
propose a novel system for human presence detection using the channel state
information (CSI) of WiFi signals. Our system named attention-enhanced deep
learning for presence detection (ALPD) employs an attention mechanism to
automatically select informative subcarriers from the CSI data and a
bidirectional long short-term memory (LSTM) network to capture temporal
dependencies in CSI. Additionally, we utilize a static feature to improve the
accuracy of human presence detection in static states. We evaluate the proposed
ALPD system by deploying a pair of WiFi access points (APs) for collecting CSI
dataset, which is further compared with several benchmarks. The results
demonstrate that our ALPD system outperforms the benchmarks in terms of
accuracy, especially in the presence of interference. Moreover, bidirectional
transmission data is beneficial to training improving stability and accuracy,
as well as reducing the costs of data collection for training. Overall, our
proposed ALPD system shows promising results for human presence detection using
WiFi CSI signals.
( 2
min )
We consider two popular approaches to Knowledge Graph Completion (KGC):
textual models that rely on textual entity descriptions, and structure-based
models that exploit the connectivity structure of the Knowledge Graph (KG).
Preliminary experiments show that these approaches have complementary
strengths: structure-based models perform well when the gold answer is easily
reachable from the query head in the KG, while textual models exploit
descriptions to give good performance even when the gold answer is not
reachable. In response, we explore ensembling as a way of combining the best of
both approaches. We propose a novel method for learning query-dependent
ensemble weights by using the distributions of scores assigned by individual
models to all candidate entities. Our ensemble baseline achieves
state-of-the-art results on three standard KGC datasets, with up to 6.8 pt MRR
and 8.3 pt Hits@1 gains over best individual models.
( 2
min )
There is currently a large gap in performance between the statistically
rigorous methods like linear regression or additive splines and the powerful
deep methods using neural networks. Previous works attempting to close this gap
have failed to fully investigate the exponentially growing number of feature
combinations which deep networks consider automatically during training. In
this work, we develop a tractable selection algorithm to efficiently identify
the necessary feature combinations by leveraging techniques in feature
interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN)
construct a bridge from these simple and interpretable models to fully
connected neural networks. SIAN achieves competitive performance against
state-of-the-art methods across multiple large-scale tabular datasets and
consistently finds an optimal tradeoff between the modeling capacity of neural
networks and the generalizability of simpler methods.
( 2
min )
Large Language Models (LLMs) are huge artificial neural networks which
primarily serve to generate text, but also provide a very sophisticated
probabilistic model of language use. Since generating a semantically consistent
text requires a form of effective memory, we investigate the memory properties
of LLMs and find surprising similarities with key characteristics of human
memory. This result strongly suggests that the biological features of human
memory leave an imprint on the way that we structure our textual narratives.
( 2
min )
We study differentially private stochastic convex optimization (DP-SCO) under
user-level privacy, where each user may hold multiple data items. Existing work
for user-level DP-SCO either requires super-polynomial runtime [Ghazi et al.
(2023)] or requires the number of users to grow polynomially with the
dimensionality of the problem with additional strict assumptions [Bassily et
al. (2023)]. We develop new algorithms for user-level DP-SCO that obtain
optimal rates for both convex and strongly convex functions in polynomial time
and require the number of users to grow only logarithmically in the dimension.
Moreover, our algorithms are the first to obtain optimal rates for non-smooth
functions in polynomial time. These algorithms are based on multiple-pass
DP-SGD, combined with a novel private mean estimation procedure for
concentrated data, which applies an outlier removal step before estimating the
mean of the gradients.
( 2
min )
In this paper, we present the results of the NeurIPS-2022 Neural MMO
Challenge, which attracted 500 participants and received over 1,600
submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved
agents from 16 populations surviving in procedurally generated worlds by
collecting resources and defeating opponents. This year's competition runs on
the latest v1.6 Neural MMO, which introduces new equipment, combat, trading,
and a better scoring system. These elements combine to pose additional
robustness and generalization challenges not present in previous competitions.
This paper summarizes the design and results of the challenge, explores the
potential of this environment as a benchmark for learning methods, and presents
some practical reinforcement learning training approaches for complex tasks
with sparse rewards. Additionally, we have open-sourced our baselines,
including environment wrappers, benchmarks, and visualization tools for future
research.
( 2
min )
Discriminatively trained, deterministic neural networks are the de facto
choice for classification problems. However, even though they achieve
state-of-the-art results on in-domain test sets, they tend to be overconfident
on out-of-distribution (OOD) data. For instance, ReLU networks -- a popular
class of neural network architectures -- have been shown to almost always yield
high confidence predictions when the test data are far away from the training
set, even when they are trained with OOD data. We overcome this problem by
adding a term to the output of the neural network that corresponds to the logit
of an extra class, that we design to dominate the logits of the original
classes as we move away from the training data.This technique provably prevents
arbitrarily high confidence on far-away test data while maintaining a simple
discriminative point-estimate training. Evaluation on various benchmarks
demonstrates strong performance against competitive baselines on both far-away
and realistic OOD data.
( 2
min )
Federated learning (FL) has shown promising potential in safeguarding data
privacy in healthcare collaborations. While the term "FL" was originally coined
by the engineering community, the statistical field has also explored similar
privacy-preserving algorithms. Statistical FL algorithms, however, remain
considerably less recognized than their engineering counterparts. Our goal was
to bridge the gap by presenting the first comprehensive comparison of FL
frameworks from both engineering and statistical domains. We evaluated five FL
frameworks using both simulated and real-world data. The results indicate that
statistical FL algorithms yield less biased point estimates for model
coefficients and offer convenient confidence interval estimations. In contrast,
engineering-based methods tend to generate more accurate predictions, sometimes
surpassing central pooled and statistical FL models. This study underscores the
relative strengths and weaknesses of both types of methods, emphasizing the
need for increased awareness and their integration in future FL applications.
( 2
min )
In this paper, neural network approximation methods are developed for
elliptic partial differential equations with multi-frequency solutions. Neural
network work approximation methods have advantages over classical approaches in
that they can be applied without much concerns on the form of the differential
equations or the shape or dimension of the problem domain. When applied to
problems with multi-frequency solutions, the performance and accuracy of neural
network approximation methods are strongly affected by the contrast of the
high- and low-frequency parts in the solutions. To address this issue, domain
scaling and residual correction methods are proposed. The efficiency and
accuracy of the proposed methods are demonstrated for multi-frequency model
problems.
( 2
min )
Research in scientific disciplines evolves, often rapidly, over time with the
emergence of novel methodologies and their associated terminologies. While
methodologies themselves being conceptual in nature and rather difficult to
automatically extract and characterise, in this paper, we seek to develop
supervised models for automatic extraction of the names of the various
constituents of a methodology, e.g., `R-CNN', `ELMo' etc. The main research
challenge for this task is effectively modeling the contexts around these
methodology component names in a few-shot or even a zero-shot setting. The main
contributions of this paper towards effectively identifying new evolving
scientific methodology names are as follows: i) we propose a factored approach
to sequence modeling, which leverages a broad-level category information of
methodology domains, e.g., `NLP', `RL' etc.; ii) to demonstrate the feasibility
of our proposed approach of identifying methodology component names under a
practical setting of fast evolving AI literature, we conduct experiments
following a simulated chronological setup (newer methodologies not seen during
the training process); iii) our experiments demonstrate that the factored
approach outperforms state-of-the-art baselines by margins of up to 9.257\% for
the methodology extraction task with the few-shot setup.
( 2
min )
We present a new high-level synthesis methodology for using large language
model tools to generate hardware designs. The methodology uses exclusively
open-source tools excluding the large language model. As a case study, we use
our methodology to generate a permuted congruential random number generator
design with a wishbone interface. We verify the functionality and quality of
the random number generator design using large language model-generated
simulations and the Dieharder randomness test suite. We document all the large
language model chat logs, Python scripts, Verilog scripts, and simulation
results used in the case study. We believe that our method of hardware design
generation coupled with the open source silicon 130 nm design tools will
revolutionize application-specific integrated circuit design. Our methodology
significantly lowers the bar to entry when building domain-specific computing
accelerators for the Internet of Things and proof of concept prototypes for
later fabrication in more modern process nodes.
( 2
min )
Federated learning (FL) is an emerging paradigm for training deep neural
networks (DNNs) in distributed manners. Current FL approaches all suffer from
high communication overhead and information leakage. In this work, we present a
federated learning algorithm based on evolution strategies (FedES), a
zeroth-order training method. Instead of transmitting model parameters, FedES
only communicates loss values, and thus has very low communication overhead.
Moreover, a third party is unable to estimate gradients without knowing the
pre-shared seed, which protects data privacy. Experimental results demonstrate
FedES can achieve the above benefits while keeping convergence performance the
same as that with back propagation methods.
( 2
min )
One of the most promising developments in computer vision in recent years is
the use of generative neural networks for functionality condition-based 3D
design reconstruction and generation. Here, neural networks learn dependencies
between functionalities and a geometry in a very effective way. For a neural
network the functionalities are translated in conditions to a certain geometry.
But the more conditions the design generation needs to reflect, the more
difficult it is to learn clear dependencies. This leads to a multi criteria
design problem due various conditions, which are not considered in the neural
network structure so far.
In this paper, we address this multi-criteria challenge for a 3D design use
case related to an unmanned aerial vehicle (UAV) motor mount. We generate
10,000 abstract 3D designs and subject them all to simulations for three
physical disciplines: mechanics, thermodynamics, and aerodynamics. Then, we
train a Conditional Variational Autoencoder (CVAE) using the geometry and
corresponding multicriteria functional constraints as input. We use our trained
CVAE as well as the Marching cubes algorithm to generate meshes for simulation
based evaluation. The results are then evaluated with the generated UAV
designs. Subsequently, we demonstrate the ability to generate optimized designs
under self-defined functionality conditions using the trained neural network.
( 3
min )
Consistency-based diagnosis is an established approach to diagnose technical
applications, but suffers from significant modeling efforts, especially for
dynamic multi-modal time series. Machine learning seems to be an obvious
solution, which becomes less obvious when looking at details: Which notion of
consistency can be used? If logical calculi are still to be used, how can
dynamic time series be transferred into the discrete world?
This paper presents the methodology Discret2Di for automated learning of
logical expressions for consistency-based diagnosis. While these logical
calculi have advantages by providing a clear notion of consistency, they have
the key problem of relying on a discretization of the dynamic system. The
solution presented combines machine learning from both the time series and the
symbolic domain to automate the learning of logical rules for consistency-based
diagnosis.
( 2
min )
The adoption of diagnosis and prognostic algorithms in healthcare has led to
concerns about the perpetuation of bias against disadvantaged groups of
individuals. Deep learning methods to detect and mitigate bias have revolved
around modifying models, optimization strategies, and threshold calibration
with varying levels of success. Here, we generate a data-centric,
model-agnostic, task-agnostic approach to evaluate dataset bias by
investigating the relationship between how easily different groups are learned
at small sample sizes (AEquity). We then apply a systematic analysis of AEq
values across subpopulations to identify and mitigate manifestations of racial
bias in two known cases in healthcare - Chest X-rays diagnosis with deep
convolutional neural networks and healthcare utilization prediction with
multivariate logistic regression. AEq is a novel and broadly applicable metric
that can be applied to advance equity by diagnosing and remediating bias in
healthcare datasets.
( 2
min )
Visualization tools can help synthetic biologists and molecular programmers
understand the complex reactive pathways of nucleic acid reactions, which can
be designed for many potential applications and can be modelled using a
continuous-time Markov chain (CTMC). Here we present ViDa, a new visualization
approach for DNA reaction trajectories that uses a 2D embedding of the
secondary structure state space underlying the CTMC model. To this end, we
integrate a scattering transform of the secondary structure adjacency, a
variational autoencoder, and a nonlinear dimensionality reduction method. We
augment the training loss with domain-specific supervised terms that capture
both thermodynamic and kinetic features. We assess ViDa on two well-studied DNA
hybridization reactions. Our results demonstrate that the domain-specific
features lead to significant quality improvements over the state-of-the-art in
DNA state space visualization, successfully separating different folding
pathways and thus providing useful insights into dominant reaction mechanisms.
( 2
min )
Try to generate new bridge types using generative artificial intelligence
technology. The grayscale images of the bridge facade with the change of
component width was rendered by 3dsMax animation software, and then the OpenCV
module performed an appropriate amount of geometric transformation (rotation,
horizontal scale, vertical scale) to obtain the image dataset of three-span
beam bridge, arch bridge, cable-stayed bridge and suspension bridge. Based on
Python programming language, TensorFlow and Keras deep learning platform
framework, variational autoencoder was constructed and trained, and
low-dimensional bridge-type latent space that is convenient for vector
operations was obtained. Variational autoencoder can combine two bridge types
on the basis of the original of human into one that is a new bridge type.
Generative artificial intelligence technology can assist bridge designers in
bridge-type innovation, and can be used as copilot.
( 2
min )
We introduce AdaSub, a stochastic optimization algorithm that computes a
search direction based on second-order information in a low-dimensional
subspace that is defined adaptively based on available current and past
information. Compared to first-order methods, second-order methods exhibit
better convergence characteristics, but the need to compute the Hessian matrix
at each iteration results in excessive computational expenses, making them
impractical. To address this issue, our approach enables the management of
computational expenses and algorithm efficiency by enabling the selection of
the subspace dimension for the search. Our code is freely available on GitHub,
and our preliminary numerical results demonstrate that AdaSub surpasses popular
stochastic optimizers in terms of time and number of iterations required to
reach a given accuracy.
( 2
min )
As control engineering methods are applied to increasingly complex systems,
data-driven approaches for system identification appear as a promising
alternative to physics-based modeling. While the Bayesian approaches prevalent
for safety-critical applications usually rely on the availability of state
measurements, the states of a complex system are often not directly measurable.
It may then be necessary to jointly estimate the dynamics and the latent state,
making the quantification of uncertainties and the design of controllers with
formal performance guarantees considerably more challenging. This paper
proposes a novel method for the computation of an optimal input trajectory for
unknown nonlinear systems with latent states based on a combination of particle
Markov chain Monte Carlo methods and scenario theory. Probabilistic performance
guarantees are derived for the resulting input trajectory, and an approach to
validate the performance of arbitrary control laws is presented. The
effectiveness of the proposed method is demonstrated in a numerical simulation.
( 2
min )
The mean shift (MS) algorithm seeks a mode of the kernel density estimate
(KDE). This study presents a convergence guarantee of the mode estimate
sequence generated by the MS algorithm and an evaluation of the convergence
rate, under fairly mild conditions, with the help of the argument concerning
the {\L}ojasiewicz inequality. Our findings extend existing ones covering
analytic kernels and the Epanechnikov kernel. Those are significant in that
they cover the biweight kernel, which is optimal among non-negative kernels in
terms of the asymptotic statistical efficiency for the KDE-based mode
estimation.
( 2
min )
We present an exact Bayesian inference method for discrete statistical
models, which can find exact solutions to a large class of discrete inference
problems, even with infinite support and continuous priors. To express such
models, we introduce a probabilistic programming language that supports
discrete and continuous sampling, discrete observations, affine functions,
(stochastic) branching, and conditioning on discrete events. Our key tool is
probability generating functions: they provide a compact closed-form
representation of distributions that are definable by programs, thus enabling
the exact computation of posterior probabilities, expectation, variance, and
higher moments. Our inference method is provably correct and fully automated in
a tool called Genfer, which uses automatic differentiation (specifically,
Taylor polynomials), but does not require computer algebra. Our experiments
show that Genfer is often faster than the existing exact inference tools PSI,
Dice, and Prodigy. On a range of real-world inference problems that none of
these exact tools can solve, Genfer's performance is competitive with
approximate Monte Carlo methods, while avoiding approximation errors.
( 2
min )
We study the training dynamics of a shallow neural network with quadratic
activation functions and quadratic cost in a teacher-student setup. In line
with previous works on the same neural architecture, the optimization is
performed following the gradient flow on the population risk, where the average
over data points is replaced by the expectation over their distribution,
assumed to be Gaussian.We first derive convergence properties for the gradient
flow and quantify the overparameterization that is necessary to achieve a
strong signal recovery. Then, assuming that the teachers and the students at
initialization form independent orthonormal families, we derive a
high-dimensional limit for the flow and show that the minimal
overparameterization is sufficient for strong recovery. We verify by numerical
experiments that these results hold for more general initializations.
( 2
min )
We study scalable machine learning models for full event reconstruction in
high-energy electron-positron collisions based on a highly granular detector
simulation. Particle-flow reconstruction can be formulated as a supervised
learning task using tracks and calorimeter clusters or hits. We compare a graph
neural network and kernel-based transformer and demonstrate that both avoid
quadratic memory allocation and computational cost while achieving realistic
reconstruction. We show that hyperparameter tuning on a supercomputer
significantly enhances the physics performance of the models, improving the jet
transverse momentum resolution by up to 50% compared to the baseline. The
resulting model is highly portable across hardware processors. Finally, we
demonstrate that the model can be trained on highly granular inputs consisting
of tracks and calorimeter hits, resulting in a competitive physics performance
with the baseline. Datasets and software to reproduce the studies are published
following the findable, accessible, interoperable, and reusable principles.
( 2
min )
The aim of this paper is to make clear and precise the relationship between
the Rubin causal model (RCM) and structural causal model (SCM) frameworks for
causal inference. Adopting a neutral logical perspective, and drawing on
previous work, we show what is required for an RCM to be representable by an
SCM. A key result then shows that every RCM -- including those that violate
algebraic principles implied by the SCM framework -- emerges as an abstraction
of some representable RCM. Finally, we illustrate the power of this
conciliatory perspective by pinpointing an important role for SCM principles in
classic applications of RCMs; conversely, we offer a characterization of the
algebraic constraints implied by a graph, helping to substantiate further
comparisons between the two frameworks.
( 2
min )
There is currently a large gap in performance between the statistically
rigorous methods like linear regression or additive splines and the powerful
deep methods using neural networks. Previous works attempting to close this gap
have failed to fully investigate the exponentially growing number of feature
combinations which deep networks consider automatically during training. In
this work, we develop a tractable selection algorithm to efficiently identify
the necessary feature combinations by leveraging techniques in feature
interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN)
construct a bridge from these simple and interpretable models to fully
connected neural networks. SIAN achieves competitive performance against
state-of-the-art methods across multiple large-scale tabular datasets and
consistently finds an optimal tradeoff between the modeling capacity of neural
networks and the generalizability of simpler methods.
( 2
min )
We study versions of Hilbert's projective metric for spaces of integrable
functions of bounded growth. These metrics originate from cones which are
relaxations of the cone of all non-negative functions, in the sense that they
include all functions having non-negative integral values when multiplied with
certain test functions. We show that kernel integral operators are contractions
with respect to suitable specifications of such metrics even for kernels which
are not bounded away from zero, provided that the decay to zero of the kernel
is controlled. As an application to entropic optimal transport, we show
exponential convergence of Sinkhorn's algorithm in settings where the marginal
distributions have sufficiently light tails compared to the growth of the cost
function.
( 2
min )
In this post, we show you how to create a MAP connector to AWS HealthImaging, which is reusable in applications built with the MONAI Deploy App SDK, to integrate with and accelerate image data retrieval from a cloud-native DICOM store to medical imaging AI workloads. The MONAI Deploy SDK can be used to support hospital operations. We also demonstrate two hosting options to deploy MAP AI applications on SageMaker at scale.
( 10
min )
This post explores how Amazon CodeWhisperer can help with code optimization for sustainability through increased resource efficiency. Computationally resource-efficient coding is one technique that aims to reduce the amount of energy required to process a line of code and, as a result, aid companies in consuming less energy overall. In this era of cloud computing, […]
( 8
min )
NVIDIA’s AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out: NVIDIA Eos — an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — completed a Read article >
( 7
min )
When patients in Vietnam enter a medical facility in distress, doctors use NVIDIA technology to get more accurate scans to diagnose their ailments. In Hong Kong, a different set of doctors leverage generative AI to discover new cures for patients. Improving the health and well-being of citizens and strengthening economies and communities are key themes Read article >
( 6
min )
Clinician-led healthcare AI company Harrison.ai has built an AI system that effectively serves as a “spell checker” for radiologists — flagging critical findings to improve the speed and accuracy of radiology image analysis, reducing misdiagnoses. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Harrison.ai cofounder and CEO Aengus Tran about Read article >
( 6
min )
Neural Radiance Fields (NeRF) enable 3D scene reconstruction from 2D images
and camera poses for Novel View Synthesis (NVS). Although NeRF can produce
photorealistic results, it often suffers from overfitting to training views,
leading to poor geometry reconstruction, especially in low-texture areas. This
limitation restricts many important applications which require accurate
geometry, such as extrapolated NVS, HD mapping and scene editing. To address
this limitation, we propose a new method to improve NeRF's 3D structure using
only RGB images and semantic maps. Our approach introduces a novel plane
regularization based on Singular Value Decomposition (SVD), that does not rely
on any geometric prior. In addition, we leverage the Structural Similarity
Index Measure (SSIM) in our loss design to properly initialize the volumetric
representation of NeRF. Quantitative and qualitative results show that our
method outperforms popular regularization approaches in accurate geometry
reconstruction for large-scale outdoor scenes and achieves SoTA rendering
quality on the KITTI-360 NVS benchmark.
( 2
min )
A significant challenge facing researchers in the area of multi-agent
reinforcement learning (MARL) pertains to the identification of a library that
can offer fast and compatible development for multi-agent tasks and algorithm
combinations, while obviating the need to consider compatibility issues. In
this paper, we present MARLlib, a library designed to address the
aforementioned challenge by leveraging three key mechanisms: 1) a standardized
multi-agent environment wrapper, 2) an agent-level algorithm implementation,
and 3) a flexible policy mapping strategy. By utilizing these mechanisms,
MARLlib can effectively disentangle the intertwined nature of the multi-agent
task and the learning process of the algorithm, with the ability to
automatically alter the training strategy based on the current task's
attributes. The MARLlib library's source code is publicly accessible on GitHub:
\url{https://github.com/Replicable-MARL/MARLlib}.
( 2
min )
A quantum thermal machine is an open quantum system that enables the
conversion between heat and work at the micro or nano-scale. Optimally
controlling such out-of-equilibrium systems is a crucial yet challenging task
with applications to quantum technologies and devices. We introduce a general
model-free framework based on Reinforcement Learning to identify
out-of-equilibrium thermodynamic cycles that are Pareto optimal trade-offs
between power and efficiency for quantum heat engines and refrigerators. The
method does not require any knowledge of the quantum thermal machine, nor of
the system model, nor of the quantum state. Instead, it only observes the heat
fluxes, so it is both applicable to simulations and experimental devices. We
test our method on a model of an experimentally realistic refrigerator based on
a superconducting qubit, and on a heat engine based on a quantum harmonic
oscillator. In both cases, we identify the Pareto-front representing optimal
power-efficiency tradeoffs, and the corresponding cycles. Such solutions
outperform previous proposals made in the literature, such as optimized Otto
cycles, reducing quantum friction.
( 2
min )
In this paper, we introduce faster first-order primal-dual algorithms for
minimizing a convex function subject to strongly convex function constraints.
Before our work, the best complexity bound was $\mathcal{O}(1/{\varepsilon})$,
and it remains unclear how to improve this result by leveraging the strong
convexity assumption. We address this issue by developing novel techniques to
progressively estimate the strong convexity of the Lagrangian function. Our
approach yields an improved complexity of $\mathcal{O}(1/\sqrt{\varepsilon})$,
matching the complexity lower bound for strongly-convex-concave saddle point
optimization. We show the superior performance of our methods in
sparsity-inducing constrained optimization, notably Google's personalized
PageRank problem. Furthermore, we show that a restarted version of the proposed
methods can effectively identify the sparsity pattern of the optimal solution
within a finite number of steps, a result that appears to have independent
significance.
( 2
min )
Imitation learning of robot policies from few demonstrations is crucial in
open-ended applications. We propose a new method, Interaction Warping, for
learning SE(3) robotic manipulation policies from a single demonstration. We
infer the 3D mesh of each object in the environment using shape warping, a
technique for aligning point clouds across object instances. Then, we represent
manipulation actions as keypoints on objects, which can be warped with the
shape of the object. We show successful one-shot imitation learning on three
simulated and real-world object re-arrangement tasks. We also demonstrate the
ability of our method to predict object meshes and robot grasps in the wild.
( 2
min )
Interatomic potentials learned using machine learning methods have been
successfully applied to atomistic simulations. However, accurate models require
large training datasets, while generating reference calculations is
computationally demanding. To bypass this difficulty, we propose a transfer
learning algorithm that leverages the ability of graph neural networks (GNNs)
to represent chemical environments together with kernel mean embeddings. We
extract a feature map from GNNs pre-trained on the OC20 dataset and use it to
learn the potential energy surface from system-specific datasets of catalytic
processes. Our method is further enhanced by incorporating into the kernel the
chemical species information, resulting in improved performance and
interpretability. We test our approach on a series of realistic datasets of
increasing complexity, showing excellent generalization and transferability
performance, and improving on methods that rely on GNNs or ridge regression
alone, as well as similar fine-tuning approaches.
( 2
min )
Weakly supervised semantic segmentation (WSSS) aims to bypass the need for
laborious pixel-level annotation by using only image-level annotation. Most
existing methods rely on Class Activation Maps (CAM) to derive pixel-level
pseudo-labels and use them to train a fully supervised semantic segmentation
model. Although these pseudo-labels are class-aware, indicating the coarse
regions for particular classes, they are not object-aware and fail to delineate
accurate object boundaries. To address this, we introduce a simple yet
effective method harnessing the Segment Anything Model (SAM), a class-agnostic
foundation model capable of producing fine-grained instance masks of objects,
parts, and subparts. We use CAM pseudo-labels as cues to select and combine SAM
masks, resulting in high-quality pseudo-labels that are both class-aware and
object-aware. Our approach is highly versatile and can be easily integrated
into existing WSSS methods without any modification. Despite its simplicity,
our approach shows consistent gain over the state-of-the-art WSSS methods on
both PASCAL VOC and MS-COCO datasets.
( 2
min )
Convolutional neural networks necessitate good algorithms to reduce
complexity, and sufficient utilization of parallel processors for acceleration.
Within convolutional layers, there are three types of operators: convolution
used in forward propagation, deconvolution and dilated-convolution utilized in
backward propagation. During the execution of these operators, zeros are
typically added to tensors, leading to redundant calculations and unnecessary
strain on hardware. To circumvent these inefficiencies, we propose the C-K-S
algorithm, accompanied by efficient GPU implementations. C-K-S trims filters to
exclude zero-padding. For deconvolution and dilated-convolution, C-K-S
transforms sparse tensors into dense tensors, and standardizes the local
computational rules to simplify the hardware control. The experimental results
demonstrate that C-K-S offers good performance in terms of speed and
convergence, surpassing the capabilities of PyTorch and cuDNN in certain
scenarios.
( 2
min )
This work introduces the first small-loss and gradual-variation regret bounds
for online portfolio selection, marking the first instances of data-dependent
bounds for online convex optimization with non-Lipschitz, non-smooth losses.
The algorithms we propose exhibit sublinear regret rates in the worst cases and
achieve logarithmic regrets when the data is "easy," with per-iteration time
almost linear in the number of investment alternatives. The regret bounds are
derived using novel smoothness characterizations of the logarithmic loss, a
local norm-based analysis of following the regularized leader (FTRL) with
self-concordant regularizers, which are not necessarily barriers, and an
implicit variant of optimistic FTRL with the log-barrier.
( 2
min )
We demonstrate a validity problem of machine learning in the vital
application area of disease diagnosis in medicine. It arises when target labels
in training data are determined by an indirect measurement, and the fundamental
measurements needed to determine this indirect measurement are included in the
input data representation. Machine learning models trained on this data will
learn nothing else but to exactly reconstruct the known target definition. Such
models show perfect performance on similarly constructed test data but will
fail catastrophically on real-world examples where the defining fundamental
measurements are not or only incompletely available. We present a general
procedure allowing identification of problematic datasets and black-box machine
learning models trained on them, and exemplify our detection procedure on the
task of early prediction of sepsis.
( 2
min )
Estimating a prediction function is a fundamental component of many data
analyses. The Super Learner ensemble, a particular implementation of stacking,
has desirable theoretical properties and has been used successfully in many
applications. Dimension reduction can be accomplished by using variable
screening algorithms, including the lasso, within the ensemble prior to fitting
other prediction algorithms. However, the performance of a Super Learner using
the lasso for dimension reduction has not been fully explored in cases where
the lasso is known to perform poorly. We provide empirical results that suggest
that a diverse set of candidate screening algorithms should be used to protect
against poor performance of any one screen, similar to the guidance for
choosing a library of prediction algorithms for the Super Learner.
( 2
min )
Kernel density estimation (KDE) is integral to a range of generative and
discriminative tasks in machine learning. Drawing upon tools from the
multidimensional calculus of variations, we derive an optimal weight function
that reduces bias in standard kernel density estimates for density ratios,
leading to improved estimates of prediction posteriors and
information-theoretic measures. In the process, we shed light on some
fundamental aspects of density estimation, particularly from the perspective of
algorithms that employ KDEs as their main building blocks.
( 2
min )
We propose the Kuramoto Graph Neural Network (KuramotoGNN), a novel class of
continuous-depth graph neural networks (GNNs) that employs the Kuramoto model
to mitigate the over-smoothing phenomenon, in which node features in GNNs
become indistinguishable as the number of layers increases. The Kuramoto model
captures the synchronization behavior of non-linear coupled oscillators. Under
the view of coupled oscillators, we first show the connection between Kuramoto
model and basic GNN and then over-smoothing phenomenon in GNNs can be
interpreted as phase synchronization in Kuramoto model. The KuramotoGNN
replaces this phase synchronization with frequency synchronization to prevent
the node features from converging into each other while allowing the system to
reach a stable synchronized state. We experimentally verify the advantages of
the KuramotoGNN over the baseline GNNs and existing methods in reducing
over-smoothing on various graph deep learning benchmark tasks.
( 2
min )
In biomedical applications it is often necessary to estimate a physiological
response to a treatment consisting of multiple components, and learn the
separate effects of the components in addition to the joint effect. Here, we
extend existing probabilistic nonparametric approaches to explicitly address
this problem. We also develop a new convolution-based model for composite
treatment-response curves that is more biologically interpretable. We validate
our models by estimating the impact of carbohydrate and fat in meals on blood
glucose. By differentiating treatment components, incorporating their dosages,
and sharing statistical information across patients via a hierarchical
multi-output Gaussian process, our method improves prediction accuracy over
existing approaches, and allows us to interpret the different effects of
carbohydrates and fat on the overall glucose response.
( 2
min )
We show that the likelihood function for a multinomial vector observed under
arbitrary interval censoring constraints on the frequencies or their partial
sums is completely log-concave by proving that the constrained sample spaces
comprise M-convex subsets of the discrete simplex.
( 2
min )
This paper studies Anderson acceleration (AA) for fixed-point methods
${x}^{(k+1)}=q({x}^{(k)})$. It provides the first proof that when the operator
$q$ is linear and symmetric, AA improves the root-linear convergence factor
over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric
Jacobian at the solution, a slightly modified AA algorithm is proved to have an
analogous root-linear convergence factor improvement over fixed-point
iterations. Simulations verify our observations. Furthermore, experiments with
different data models demonstrate AA is significantly superior to the standard
fixed-point methods for Tyler's M-estimation.
( 2
min )
Large language models (LLMs) with their broad knowledge, can generate human-like text on almost any topic. However, their training on massive datasets also limits their usefulness for specialized tasks. Without continued learning, these models remain oblivious to new data and trends that emerge after their initial training. Furthermore, the cost to train new LLMs can […]
( 14
min )
This research paper was presented at the 64th IEEE Symposium on Foundations of Computer Science (FOCS) 2023 (opens in new tab), a premier forum for the latest research in theoretical computer science. Submodular functions are versatile mathematical tools, finding diverse applications in real-world scenarios and guiding solutions across complex domains. From dissecting the intricate networks […]
The post Toward developing faster algorithms for minimizing submodular functions appeared first on Microsoft Research.
( 10
min )
Taiwanese artist Steven Tung creates captivating 2D and 3D digital art that explores sci-fi, minimalism and realism and pushes artistic boundaries.
( 6
min )
The expressivity of Graph Neural Networks (GNNs) can be entirely
characterized by appropriate fragments of the first-order logic. Namely, any
query of the two variable fragment of graded modal logic (GC2) interpreted over
labeled graphs can be expressed using a GNN whose size depends only on the
depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021], this
description holds for a family of activation functions, leaving the possibility
for a hierarchy of logics expressible by GNNs depending on the chosen
activation function. In this article, we show that such hierarchy indeed exists
by proving that GC2 queries cannot be expressed by GNNs with polynomial
activation functions. This implies a separation between polynomial and popular
non-polynomial activations (such as ReLUs, sigmoid and hyperbolic tan and
others) and answers an open question formulated by [Grohe, 2021].
( 2
min )
Quantifying the difference between two probability density functions, $p$ and
$q$, using available data, is a fundamental problem in Statistics and Machine
Learning. A usual approach for addressing this problem is the likelihood-ratio
estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has
been investigated mainly for the offline case. This paper contributes by
introducing a new framework for online non-parametric LRE (OLRE) for the
setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are
observed over time. The non-parametric nature of our approach has the advantage
of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the
recent advances in Kernel Methods and functional minimization to develop an
estimator that can be efficiently updated online. We provide theoretical
guarantees for the performance of the OLRE method along with empirical
validation in synthetic experiments.
( 2
min )
An emerging new paradigm for solving inverse problems is via the use of deep
learning to learn a regularizer from data. This leads to high-quality results,
but often at the cost of provable guarantees. In this work, we show how
well-posedness and convergent regularization arises within the convex-nonconvex
(CNC) framework for inverse problems. We introduce a novel input weakly convex
neural network (IWCNN) construction to adapt the method of learned adversarial
regularization to the CNC framework. Empirically we show that our method
overcomes numerical issues of previous adversarial methods.
( 2
min )
Optical computing systems can provide high-speed and low-energy data
processing but face deficiencies in computationally demanding training and
simulation-to-reality gap. We propose a model-free solution for lightweight in
situ optimization of optical computing systems based on the score gradient
estimation algorithm. This approach treats the system as a black box and
back-propagates loss directly to the optical weights' probabilistic
distributions, hence circumventing the need for computation-heavy and biased
system simulation. We demonstrate a superior classification accuracy on the
MNIST and FMNIST datasets through experiments on a single-layer diffractive
optical computing system. Furthermore, we show its potential for image-free and
high-speed cell analysis. The inherent simplicity of our proposed method,
combined with its low demand for computational resources, expedites the
transition of optical computing from laboratory demonstrations to real-world
applications.
( 2
min )
Restricting the variance of a policy's return is a popular choice in
risk-averse Reinforcement Learning (RL) due to its clear mathematical
definition and easy interpretability. Traditional methods directly restrict the
total return variance. Recent methods restrict the per-step reward variance as
a proxy. We thoroughly examine the limitations of these variance-based methods,
such as sensitivity to numerical scale and hindering of policy learning, and
propose to use an alternative risk measure, Gini deviation, as a substitute. We
study various properties of this new risk measure and derive a policy gradient
algorithm to minimize it. Empirical evaluation in domains where risk-aversion
can be clearly defined, shows that our algorithm can mitigate the limitations
of variance-based risk measures and achieves high return with low risk in terms
of variance and Gini deviation when others fail to learn a reasonable policy.
( 2
min )
We show how to "compile" human-readable programs into standard decoder-only
transformer models. Our compiler, Tracr, generates models with known structure.
This structure can be used to design experiments. For example, we use it to
study "superposition" in transformers that execute multi-step algorithms.
Additionally, the known structure of Tracr-compiled models can serve as
ground-truth for evaluating interpretability methods. Commonly, because the
"programs" learned by transformers are unknown it is unclear whether an
interpretation succeeded. We demonstrate our approach by implementing and
examining programs including computing token frequencies, sorting, and
parenthesis checking. We provide an open-source implementation of Tracr at
https://github.com/google-deepmind/tracr.
( 2
min )
Quantifying the difference between two probability density functions, $p$ and
$q$, using available data, is a fundamental problem in Statistics and Machine
Learning. A usual approach for addressing this problem is the likelihood-ratio
estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has
been investigated mainly for the offline case. This paper contributes by
introducing a new framework for online non-parametric LRE (OLRE) for the
setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are
observed over time. The non-parametric nature of our approach has the advantage
of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the
recent advances in Kernel Methods and functional minimization to develop an
estimator that can be efficiently updated online. We provide theoretical
guarantees for the performance of the OLRE method along with empirical
validation in synthetic experiments.
( 2
min )
In gradient descent dynamics of neural networks, the top eigenvalue of the
Hessian of the loss (sharpness) displays a variety of robust phenomena
throughout training. This includes early time regimes where the sharpness may
decrease during early periods of training (sharpness reduction), and later time
behavior such as progressive sharpening and edge of stability. We demonstrate
that a simple $2$-layer linear network (UV model) trained on a single training
example exhibits all of the essential sharpness phenomenology observed in
real-world scenarios. By analyzing the structure of dynamical fixed points in
function space and the vector field of function updates, we uncover the
underlying mechanisms behind these sharpness trends. Our analysis reveals (i)
the mechanism behind early sharpness reduction and progressive sharpening, (ii)
the required conditions for edge of stability, and (iii) a period-doubling
route to chaos on the edge of stability manifold as learning rate is increased.
Finally, we demonstrate that various predictions from this simplified model
generalize to real-world scenarios and discuss its limitations.
( 2
min )
A stochastic process that arises by composing a function with a Markov
process is called an aggregated Markov process (AMP). The purpose of composing
a Markov process with a function can be a reduction of dimensions, e.g., a
projection onto certain coordinates. The theory around AMP has been extensively
studied e.g. by Dynkin, Cameron, Rogers and Pitman, and Kelly, all of whom
provided sufficient conditions for an AMP to remain Markov. In another
direction, Larget provided a canonical representation for AMP, which can be
used to verify the equivalence of two AMPs. The purpose of this paper is to
describe how the theory of AMP can be applied to stochastic learning theory as
they learn a particular task.
( 2
min )
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. Queries is a feature that enables you to extract specific pieces of information from varying, complex documents using natural language. Custom Queries provides a way for you to customize the Queries feature for your business-specific, non-standard documents […]
( 9
min )
We are excited to announce that Amazon SageMaker JumpStart can now stream large language model (LLM) inference responses. Token streaming allows you to see the model response output as it is being generated instead of waiting for LLMs to finish the response generation before it is made available for you to use or display. The […]
( 7
min )
GPT-4 Turbo with 128K context and lower prices, the new Assistants API, GPT-4 Turbo with Vision, DALL·E 3 API, and more.
( 7
min )
High-fidelity simulators that connect theoretical models with observations
are indispensable tools in many sciences. When coupled with machine learning, a
simulator makes it possible to infer the parameters of a theoretical model
directly from real and simulated observations without explicit use of the
likelihood function. This is of particular interest when the latter is
intractable. In this work, we introduce a simple extension of the recently
proposed likelihood-free frequentist inference (LF2I) approach that has some
computational advantages. Like LF2I, this extension yields provably valid
confidence sets in parameter inference problems in which a high-fidelity
simulator is available. The utility of our algorithm is illustrated by applying
it to three pedagogically interesting examples: the first is from cosmology,
the second from high-energy physics and astronomy, both with tractable
likelihoods, while the third, with an intractable likelihood, is from
epidemiology.
( 2
min )
To quantify uncertainty, conformal prediction methods are gaining
continuously more interest and have already been successfully applied to
various domains. However, they are difficult to apply to time series as the
autocorrelative structure of time series violates basic assumptions required by
conformal prediction. We propose HopCPT, a novel conformal prediction approach
for time series that not only copes with temporal structures but leverages
them. We show that our approach is theoretically well justified for time series
where temporal dependencies are present. In experiments, we demonstrate that
our new approach outperforms state-of-the-art conformal prediction methods on
multiple real-world time series datasets from four different domains.
( 2
min )
Manifolds discovered by machine learning models provide a compact
representation of the underlying data. Geodesics on these manifolds define
locally length-minimising curves and provide a notion of distance, which are
key for reduced-order modelling, statistical inference, and interpolation. In
this work, we propose a model-based parameterisation for distance fields and
geodesic flows on manifolds, exploiting solutions of a manifold-augmented
Eikonal equation. We demonstrate how the geometry of the manifold impacts the
distance field, and exploit the geodesic flow to obtain globally
length-minimising curves directly. This work opens opportunities for statistics
and reduced-order modelling on differentiable manifolds.
( 2
min )
In recent years, federated minimax optimization has attracted growing
interest due to its extensive applications in various machine learning tasks.
While Smoothed Alternative Gradient Descent Ascent (Smoothed-AGDA) has proved
its success in centralized nonconvex minimax optimization, how and whether
smoothing technique could be helpful in federated setting remains unexplored.
In this paper, we propose a new algorithm termed Federated Stochastic Smoothed
Gradient Descent Ascent (FESS-GDA), which utilizes the smoothing technique for
federated minimax optimization. We prove that FESS-GDA can be uniformly used to
solve several classes of federated minimax problems and prove new or better
analytical convergence results for these settings. We showcase the practical
efficiency of FESS-GDA in practical federated learning tasks of training
generative adversarial networks (GANs) and fair classification.
( 2
min )
We introduce Resilient Multiple Choice Learning (rMCL), an extension of the
MCL approach for conditional distribution estimation in regression settings
where multiple targets may be sampled for each training input. Multiple Choice
Learning is a simple framework to tackle multimodal density estimation, using
the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression
settings, the existing MCL variants focus on merging the hypotheses, thereby
eventually sacrificing the diversity of the predictions. In contrast, our
method relies on a novel learned scoring scheme underpinned by a mathematical
framework based on Voronoi tessellations of the output space, from which we can
derive a probabilistic interpretation. After empirically validating rMCL with
experiments on synthetic data, we further assess its merits on the sound source
localization problem, demonstrating its practical usefulness and the relevance
of its interpretation.
( 2
min )
In this paper, we focus on the data-driven discovery of a general
second-order particle-based model that contains many state-of-the-art models
for modeling the aggregation and collective behavior of interacting agents of
similar size and body type. This model takes the form of a high-dimensional
system of ordinary differential equations parameterized by two interaction
kernels that appraise the alignment of positions and velocities. We propose a
Gaussian Process-based approach to this problem, where the unknown model
parameters are marginalized by using two independent Gaussian Process (GP)
priors on latent interaction kernels constrained to dynamics and observational
data. This results in a nonparametric model for interacting dynamical systems
that accounts for uncertainty quantification. We also develop acceleration
techniques to improve scalability. Moreover, we perform a theoretical analysis
to interpret the methodology and investigate the conditions under which the
kernels can be recovered. We demonstrate the effectiveness of the proposed
approach on various prototype systems, including the selection of the order of
the systems and the types of interactions. In particular, we present
applications to modeling two real-world fish motion datasets that display
flocking and milling patterns up to 248 dimensions. Despite the use of small
data sets, the GP-based approach learns an effective representation of the
nonlinear dynamics in these spaces and outperforms competitor methods.
( 3
min )
Embodying the convergence of AI and academia, the University of Florida Friday inaugurated the Malachowsky Hall for Data Science & Information Technology. The sleek, seven-story building is poised to play a pivotal role in UF’s ongoing efforts to harness the transformative power of AI, reaffirming its stature as one of the nation’s leading public universities. Read article >
( 6
min )
The world’s 5 billion internet users and nearly 54 billion devices generate 3.4 petabytes of data per second, according to IDC. As digitalization accelerates, enterprise IT teams are under greater pressure to identify and block incoming cyber threats to ensure business operations and services are not interrupted — and AI-based cybersecurity provides a reliable way Read article >
( 11
min )
There’s a kind of magic that surrounds a soccer shot so powerful, it leaves spectators, players, and even commentators in a momentary state of awe. Think back to a moment when the sheer force of a strike left an entire Bundesliga stadium buzzing with energy. What exactly captures our imagination with such intensity? While there […]
( 10
min )
Thirteen new graduate student fellows will pursue exciting new paths of knowledge and discovery.
( 14
min )
Rama Ramakrishnan helps companies explore the promises and perils of large language models and other transformative AI technologies.
( 10
min )
Amazon SageMaker Canvas now supports deploying machine learning (ML) models to real-time inferencing endpoints, allowing you take your ML models to production and drive action based on ML-powered insights. SageMaker Canvas is a no-code workspace that enables analysts and citizen data scientists to generate accurate ML predictions for their business needs. Until now, SageMaker Canvas […]
( 6
min )
Recently, teachers and institutions have looked for different ways to incorporate artificial intelligence (AI) into their curriculums, whether it be teaching about machine learning (ML) or incorporating it into creating lesson plans, grading, or other educational applications. Generative AI models, in particular large language models (LLMs), have dramatically sped up AI’s impact on education. Generative […]
( 8
min )
AI technologies are having a massive impact across industries, including media and entertainment, automotive, customer service and more.
( 8
min )
Gear up with gratitude for more gaming time. GeForce NOW brings members a cornucopia of 15 newly supported games to the cloud this week. That’s just the start — there are a total of 54 titles coming in the month of November. Members can also join thousands of esports fans in the cloud with the Read article >
( 8
min )
Visual language processing (VLP) is at the forefront of generative AI, driving advancements in multimodal learning that encompasses language intelligence, vision understanding, and processing. Combined with large language models (LLM) and Contrastive Language-Image Pre-Training (CLIP) trained with a large quantity of multimodality data, visual language models (VLMs) are particularly adept at tasks like image captioning, […]
( 16
min )
Today, personally identifiable information (PII) is everywhere. PII is in emails, slack messages, videos, PDFs, and so on. It refers to any data or information that can be used to identify a specific individual. PII is sensitive in nature and includes various types of personal data, such as name, contact information, identification numbers, financial information, […]
( 8
min )
The home of the first industrial revolution just made a massive investment in the next one. The U.K. government has announced it will spend £225 million ($273 million) to build one of the world’s fastest AI supercomputers. Called Isambard-AI, it’s the latest in a series of systems named after a legendary 19th century British engineer Read article >
( 6
min )
Generative AI and large language models are stirring change across industries — but according to NVIDIA Senior Product Manager of Developer Marketing Annamalai Chockalingam, “we’re still in the early innings.” In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Chockalingam about LLMs: what they are, their current state and their future Read article >
( 5
min )
Virtual fitting room software with AR and AI is the next best alternative to physical stores. With many different kinds of virtual fitting room solutions on offer though, it can be hard to know which ones are the most feasible for your business. Let’s talk about the various approaches to developing such solutions. Types of… Read More »Approaches to creating virtual fitting room software using AR and AI
The post Approaches to creating virtual fitting room software using AR and AI appeared first on Data Science Central.
( 21
min )
The modern digital ecosystem, buzzing with the chatter of data and algorithms, presents both promises and challenges. In this intricate web, generative artificial intelligence (GenAI) shines as a beacon of innovation. To harness this power, enterprises need more than just cutting-edge technology. They need a bridge between ambition and realization—a role aptly filled by… Read More »How technical program managers can build a robust Generative AI future
The post How technical program managers can build a robust Generative AI future appeared first on Data Science Central.
( 21
min )
Generative AI is revolutionizing our creative landscape, unlocking unprecedented possibilities. But at what cost? Dive into the ethical dilemmas of this transformative technology, exploring the fine line between innovation and ethical consideration. 2022 was a huge year for Generative AI. The release of DALL-E 2 in April showed the public the possibilities of text-to-image Gen… Read More »Generative AI ethics: Navigating the boundary between human and machine creativity
The post Generative AI ethics: Navigating the boundary between human and machine creativity appeared first on Data Science Central.
( 23
min )
In the world’s largest solar race car event of the year, the University of New South Wales Sunswift Racing team is having its day in the sun. The World Solar Challenge, which first began some 35 years ago, attracts academic participants from across the globe. This year’s event drew nearly 100 competitors. The race runs Read article >
( 6
min )
The highly anticipated NVIDIA DLSS 3.5 update, including Ray Reconstruction for NVIDIA Omniverse — a platform for connecting and building custom 3D tools and apps — is now available.
( 7
min )
This post was co-written with Anthony Medeiros, Manager of Solutions Engineering and Architecture for North America Artificial Intelligence, and Blake Santschi, Business Intelligence Manager, from Schneider Electric. Additional Schneider Electric experts include Jesse Miller, Somik Chowdhury, Shaswat Babhulgaonkar, David Watkins, Mark Carlson and Barbara Sleczkowski. Enterprise Resource Planning (ERP) systems are used by companies to […]
( 10
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Amazon Bedrock is a fully managed service provided by AWS that offers developers access to foundation models (FMs) and the tools to customize them for specific applications. It allows developers to build and scale generative AI applications using FMs through an API, without managing infrastructure. You can choose from various FMs from Amazon and leading […]
( 8
min )
We are excited to announce a simplified version of the Amazon SageMaker JumpStart SDK that makes it straightforward to build, train, and deploy foundation models. The code for prediction is also simplified. In this post, we demonstrate how you can use the simplified SageMaker Jumpstart SDK to get started with using foundation models in just a couple of lines of code.
( 7
min )
Two roads diverged in a wood, and I;I took the one less traveled by,And that has made all the difference. — Robert Frost At certain points in the evolution of enterprise artificial intelligence, there’s been a fork in the road. The road less traveled has suggested a different route to a more satisfying kind of… Read More »FAIR knowledge: The key precondition for trusted generative AI
The post FAIR knowledge: The key precondition for trusted generative AI appeared first on Data Science Central.
( 21
min )
A research paper released today describes ways generative AI can assist one of the most complex engineering efforts: designing semiconductors. The work demonstrates how companies in highly specialized fields can train large language models (LLMs) on their internal data to build assistants that increase productivity. Few pursuits are as challenging as semiconductor design. Under a Read article >
( 6
min )
Teachers are the backbone of any educational system. They are not just educators; they are indispensable navigators, mentors, and leaders. Teachers around the world face many challenges, which vary from country to country or even within a city or town. But some challenges are universal, including time management, classroom organization, and creating effective lesson plans. […]
The post Teachers in India help Microsoft Research design AI tool for creating great classroom content appeared first on Microsoft Research.
( 12
min )
Complimentary approaches — “HighLight” and “Tailors and Swiftiles” — could boost the performance of demanding machine-learning tasks.
( 11
min )
The SecureLoop search tool efficiently identifies secure designs for hardware that can boost the performance of complex AI tasks, while requiring less energy.
( 10
min )
Two studies find “self-supervised” models, which learn about their environment from unlabeled data, can show activity patterns similar to those of the mammalian brain.
( 11
min )
Systematic reviews are vital for guiding practice, research, and policy, yet
they are often slow and labour-intensive. Large language models (LLMs) could
offer a way to speed up and automate systematic reviews, but their performance
in such tasks has not been comprehensively evaluated against humans, and no
study has tested GPT-4, the biggest LLM so far. This pre-registered study
evaluates GPT-4's capability in title/abstract screening, full-text review, and
data extraction across various literature types and languages using a
'human-out-of-the-loop' approach. Although GPT-4 had accuracy on par with human
performance in most tasks, results were skewed by chance agreement and dataset
imbalance. After adjusting for these, there was a moderate level of performance
for data extraction, and - barring studies that used highly reliable prompts -
screening performance levelled at none to moderate for different stages and
languages. When screening full-text literature using highly reliable prompts,
GPT-4's performance was 'almost perfect.' Penalising GPT-4 for missing key
studies using highly reliable prompts improved its performance even more. Our
findings indicate that, currently, substantial caution should be used if LLMs
are being used to conduct systematic reviews, but suggest that, for certain
systematic review tasks delivered under reliable prompts, LLMs can rival human
performance.
( 3
min )
We study the problem of designing adaptive multi-armed bandit algorithms that
perform optimally in both the stochastic setting and the adversarial setting
simultaneously (often known as a best-of-both-world guarantee). A line of
recent works shows that when configured and analyzed properly, the
Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the
adversarial setting, can in fact optimally adapt to the stochastic setting as
well. Such results, however, critically rely on an assumption that there exists
one unique optimal arm. Recently, Ito (2021) took the first step to remove such
an undesirable uniqueness assumption for one particular FTRL algorithm with the
$\frac{1}{2}$-Tsallis entropy regularizer. In this work, we significantly
improve and generalize this result, showing that uniqueness is unnecessary for
FTRL with a broad family of regularizers and a new learning rate schedule. For
some regularizers, our regret bounds also improve upon prior results even when
uniqueness holds. We further provide an application of our results to the
decoupled exploration and exploitation problem, demonstrating that our
techniques are broadly applicable.
( 3
min )
Graph neural networks (GNNs) have become compelling models designed to
perform learning and inference on graph-structured data. However, little work
has been done to understand the fundamental limitations of GNNs for scaling to
larger graphs and generalizing to out-of-distribution (OOD) inputs. In this
paper, we use a random graph generator to systematically investigate how the
graph size and structural properties affect the predictive performance of GNNs.
We present specific evidence that the average node degree is a key feature in
determining whether GNNs can generalize to unseen graphs, and that the use of
multiple node update functions can improve the generalization performance of
GNNs when dealing with graphs of multimodal degree distributions. Accordingly,
we propose a multi-module GNN framework that allows the network to adapt
flexibly to new graphs by generalizing a single canonical nonlinear
transformation over aggregated inputs. Our results show that the multi-module
GNNs improve the OOD generalization on a variety of inference tasks in the
direction of diverse structural features.
( 2
min )
Stochastic gradient descent (SGD) algorithm is the method of choice in many
machine learning tasks thanks to its scalability and efficiency in dealing with
large-scale problems. In this paper, we focus on the shuffling version of SGD
which matches the mainstream practical heuristics. We show the convergence to a
global solution of shuffling SGD for a class of non-convex functions under
over-parameterized settings. Our analysis employs more relaxed non-convex
assumptions than previous literature. Nevertheless, we maintain the desired
computational complexity as shuffling SGD has achieved in the general convex
setting.
( 2
min )
We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank
weight matrices when training deep neural networks. Our results show that
training neural networks with mini-batch SGD and weight decay causes a bias
towards rank minimization over the weight matrices. Specifically, we show, both
theoretically and empirically, that this bias is more pronounced when using
smaller batch sizes, higher learning rates, or increased weight decay.
Additionally, we predict and observe empirically that weight decay is necessary
to achieve this bias. Unlike previous literature, our analysis does not rely on
assumptions about the data, convergence, or optimality of the weight matrices
and applies to a wide range of neural network architectures of any width or
depth. Finally, we empirically investigate the connection between this bias and
generalization, finding that it has a marginal effect on generalization.
( 2
min )
This research underscores the efficacy of Fourier topological optimization in
refining MRI imagery, thereby bolstering the classification precision of
Alzheimer's Disease through convolutional neural networks. Recognizing that MRI
scans are indispensable for neurological assessments, but frequently grapple
with issues like blurriness and contrast irregularities, the deployment of
Fourier topological optimization offered enhanced delineation of brain
structures, ameliorated noise, and superior contrast. The applied techniques
prioritized boundary enhancement, contrast and brightness adjustments, and
overall image lucidity. Employing CNN architectures VGG16, ResNet50,
InceptionV3, and Xception, the post-optimization analysis revealed a marked
elevation in performance. Conclusively, the amalgamation of Fourier topological
optimization with CNNs delineates a promising trajectory for the nuanced
classification of Alzheimer's Disease, portending a transformative impact on
its diagnostic paradigms.
( 2
min )
As large language models (LLMs) are widely adopted, new safety issues and
policies emerge, to which existing safety classifiers do not generalize well.
If we have only observed a few examples of violations of a new safety rule, how
can we build a classifier to detect violations? In this paper, we study the
novel setting of domain-generalized few-shot learning for LLM-based text safety
classifiers. Unlike prior few-shot work, these new safety issues can be hard to
uncover and we do not get to choose the few examples. We demonstrate that
existing few-shot techniques do not perform well in this setting, and rather we
propose to do parameter-efficient fine-tuning (PEFT) combined with augmenting
training data based on similar examples in prior existing rules. We empirically
show that our approach of similarity-based data-augmentation + prompt-tuning
(DAPT) consistently outperforms baselines that either do not rely on data
augmentation or on PEFT by 7-17% F1 score in the Social Chemistry moral
judgement and 9-13% AUC in the Toxicity detection tasks, even when the new rule
is loosely correlated with existing ones.
( 2
min )
A fundamental problem of causal discovery is cause-effect inference, learning
the correct causal direction between two random variables. Significant progress
has been made through modelling the effect as a function of its cause and a
noise term, which allows us to leverage assumptions about the generating
function class. The recently introduced heteroscedastic location-scale noise
functional models (LSNMs) combine expressive power with identifiability
guarantees. LSNM model selection based on maximizing likelihood achieves
state-of-the-art accuracy, when the noise distributions are correctly
specified. However, through an extensive empirical evaluation, we demonstrate
that the accuracy deteriorates sharply when the form of the noise distribution
is misspecified by the user. Our analysis shows that the failure occurs mainly
when the conditional variance in the anti-causal direction is smaller than that
in the causal direction. As an alternative, we find that causal model selection
through residual independence testing is much more robust to noise
misspecification and misleading conditional variance.
( 2
min )
Cohen et al. (2021) empirically study the evolution of the largest eigenvalue
of the loss Hessian, also known as sharpness, along the gradient descent (GD)
trajectory and observe the Edge of Stability (EoS) phenomenon. The sharpness
increases at the early phase of training (referred to as progressive
sharpening), and eventually saturates close to the threshold of $2 /
\text{(step size)}$. In this paper, we start by demonstrating through empirical
studies that when the EoS phenomenon occurs, different GD trajectories (after a
proper reparameterization) align on a specific bifurcation diagram independent
of initialization. We then rigorously prove this trajectory alignment
phenomenon for a two-layer fully-connected linear network and a single-neuron
nonlinear network trained with a single data point. Our trajectory alignment
analysis establishes both progressive sharpening and EoS phenomena,
encompassing and extending recent findings in the literature.
( 2
min )
We study the problem of designing adaptive multi-armed bandit algorithms that
perform optimally in both the stochastic setting and the adversarial setting
simultaneously (often known as a best-of-both-world guarantee). A line of
recent works shows that when configured and analyzed properly, the
Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the
adversarial setting, can in fact optimally adapt to the stochastic setting as
well. Such results, however, critically rely on an assumption that there exists
one unique optimal arm. Recently, Ito (2021) took the first step to remove such
an undesirable uniqueness assumption for one particular FTRL algorithm with the
$\frac{1}{2}$-Tsallis entropy regularizer. In this work, we significantly
improve and generalize this result, showing that uniqueness is unnecessary for
FTRL with a broad family of regularizers and a new learning rate schedule. For
some regularizers, our regret bounds also improve upon prior results even when
uniqueness holds. We further provide an application of our results to the
decoupled exploration and exploitation problem, demonstrating that our
techniques are broadly applicable.
( 3
min )
Agglomerative hierarchical clustering based on Ordered Weighted Averaging
(OWA) operators not only generalises the single, complete, and average
linkages, but also includes intercluster distances based on a few nearest or
farthest neighbours, trimmed and winsorised means of pairwise point
similarities, amongst many others. We explore the relationships between the
famous Lance-Williams update formula and the extended OWA-based linkages with
weights generated via infinite coefficient sequences. Furthermore, we provide
some conditions for the weight generators to guarantee the resulting
dendrograms to be free from unaesthetic inversions.
( 2
min )
We describe a new direct method to estimate bipartite mutual information of a
classical spin system based on Monte Carlo sampling enhanced by autoregressive
neural networks. It allows studying arbitrary geometries of subsystems and can
be generalized to classical field theories. We demonstrate it on the Ising
model for four partitionings, including a multiply-connected even-odd division.
We show that the area law is satisfied for temperatures away from the critical
temperature: the constant term is universal, whereas the proportionality
coefficient is different for the even-odd partitioning.
( 2
min )
Graph generative model evaluation necessitates understanding differences
between graphs on the distributional level. This entails being able to harness
salient attributes of graphs in an efficient manner. Curvature constitutes one
such property that has recently proved its utility in characterising graphs.
Its expressive properties, stability, and practical utility in model evaluation
remain largely unexplored, however. We combine graph curvature descriptors with
emerging methods from topological data analysis to obtain robust, expressive
descriptors for evaluating graph generative models.
( 2
min )
Generative diffusion models have achieved spectacular performance in many
areas of generative modeling. While the fundamental ideas behind these models
come from non-equilibrium physics, in this paper we show that many aspects of
these models can be understood using the tools of equilibrium statistical
mechanics. Using this reformulation, we show that generative diffusion models
undergo second-order phase transitions corresponding to symmetry breaking
phenomena. We argue that this lead to a form of instability that lies at the
heart of their generative capabilities and that can be described by a set of
mean field critical exponents. We conclude by analyzing recent work connecting
diffusion models and associative memory networks in view of the thermodynamic
formulations.
( 2
min )
Flexible models for probability distributions are an essential ingredient in
many machine learning tasks. We develop and investigate a new class of
probability distributions, which we call a Squared Neural Family (SNEFY),
formed by squaring the 2-norm of a neural network and normalising it with
respect to a base measure. Following the reasoning similar to the well
established connections between infinitely wide neural networks and Gaussian
processes, we show that SNEFYs admit closed form normalising constants in many
cases of interest, thereby resulting in flexible yet fully tractable density
models. SNEFYs strictly generalise classical exponential families, are closed
under conditioning, and have tractable marginal distributions. Their utility is
illustrated on a variety of density estimation, conditional density estimation,
and density estimation with missing data tasks.
( 2
min )
Neural additive models (NAMs) can improve the interpretability of deep neural
networks by handling input features in separate additive sub-networks. However,
they lack inherent mechanisms that provide calibrated uncertainties and enable
selection of relevant features and interactions. Approaching NAMs from a
Bayesian perspective, we enhance them in three primary ways, namely by a)
providing credible intervals for the individual additive sub-networks; b)
estimating the marginal likelihood to perform an implicit selection of features
via an empirical Bayes procedure; and c) enabling a ranking of feature pairs as
candidates for second-order interaction in fine-tuned models. In particular, we
develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical
performance on tabular datasets and challenging real-world medical tasks.
( 2
min )
Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for
post-processing outputs of Markov chain Monte Carlo (MCMC). The main principle
is to greedily minimize the kernelized Stein discrepancy (KSD), which only
requires the gradient of the log-target distribution, and is thus well-suited
for Bayesian inference. The main advantages of Stein thinning are the automatic
remove of the burn-in period, the correction of the bias introduced by recent
MCMC algorithms, and the asymptotic properties of convergence towards the
target distribution. Nevertheless, Stein thinning suffers from several
empirical pathologies, which may result in poor approximations, as observed in
the literature. In this article, we conduct a theoretical analysis of these
pathologies, to clearly identify the mechanisms at stake, and suggest improved
strategies. Then, we introduce the regularized Stein thinning algorithm to
alleviate the identified pathologies. Finally, theoretical guarantees and
extensive experiments show the high efficiency of the proposed algorithm. An
implementation of regularized Stein thinning as the kernax library in python
and JAX is available at https://gitlab.com/drti/kernax.
( 3
min )
The out-of-sample error (OO) is the main quantity of interest in risk
estimation and model selection. Leave-one-out cross validation (LO) offers a
(nearly) distribution-free yet computationally demanding approach to estimate
OO. Recent theoretical work showed that approximate leave-one-out cross
validation (ALO) is a computationally efficient and statistically reliable
estimate of LO (and OO) for generalized linear models with differentiable
regularizers. For problems involving non-differentiable regularizers, despite
significant empirical evidence, the theoretical understanding of ALO's error
remains unknown. In this paper, we present a novel theory for a wide class of
problems in the generalized linear model family with non-differentiable
regularizers. We bound the error |ALO - LO| in terms of intuitive metrics such
as the size of leave-i-out perturbations in active sets, sample size n, number
of features p and regularization parameters. As a consequence, for the
$\ell_1$-regularized problems, we show that |ALO - LO| goes to zero as p goes
to infinity while n/p and SNR are fixed and bounded.
( 2
min )
Many real-world domains require safe decision making in uncertain
environments. In this work, we introduce a deep reinforcement learning
framework for approaching this important problem. We consider a distribution
over transition models, and apply a risk-averse perspective towards model
uncertainty through the use of coherent distortion risk measures. We provide
robustness guarantees for this framework by showing it is equivalent to a
specific class of distributionally robust safe reinforcement learning problems.
Unlike existing approaches to robustness in deep reinforcement learning,
however, our formulation does not involve minimax optimization. This leads to
an efficient, model-free implementation of our approach that only requires
standard data collection from a single training environment. In experiments on
continuous control tasks with safety constraints, we demonstrate that our
framework produces robust performance and safety at deployment time across a
range of perturbed test environments.
( 2
min )
We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank
weight matrices when training deep neural networks. Our results show that
training neural networks with mini-batch SGD and weight decay causes a bias
towards rank minimization over the weight matrices. Specifically, we show, both
theoretically and empirically, that this bias is more pronounced when using
smaller batch sizes, higher learning rates, or increased weight decay.
Additionally, we predict and observe empirically that weight decay is necessary
to achieve this bias. Unlike previous literature, our analysis does not rely on
assumptions about the data, convergence, or optimality of the weight matrices
and applies to a wide range of neural network architectures of any width or
depth. Finally, we empirically investigate the connection between this bias and
generalization, finding that it has a marginal effect on generalization.
( 2
min )
Generative artificial intelligence is transforming how enterprises do business. Organizations are using AI to improve data-driven decisions, enhance omnichannel experiences, and drive next-generation product development. Enterprises are using generative AI specifically to power their marketing efforts through emails, push notifications, and other outbound communication channels. Gartner predicts that “by 2025, 30% of outbound marketing messages […]
( 8
min )
Visualization is vital for understanding complex data, but existing tools require “tidy data,” adding extra steps. Learn how Data Formulator transforms concepts into visuals, promoting collaboration between analysts and AI agents.
The post Data Formulator: A concept-driven, AI-powered approach to data visualization appeared first on Microsoft Research.
( 10
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra helps you easily aggregate content from a variety of content repositories into a centralized index that lets you quickly search all your enterprise data and find the most accurate answer. Drupal is a content management software. It’s used to make many […]
( 7
min )
This is a guest post by Jose Benitez, Founder and Director of AI and Mattias Ponchon, Head of Infrastructure at Intuitivo. Intuitivo, a pioneer in retail innovation, is revolutionizing shopping with its cloud-based AI and machine learning (AI/ML) transactional processing system. This groundbreaking technology enables us to operate millions of autonomous points of purchase (A-POPs) […]
( 8
min )
Enterprises seek to harness the potential of Machine Learning (ML) to solve complex problems and improve outcomes. Until recently, building and deploying ML models required deep levels of technical and coding skills, including tuning ML models and maintaining operational pipelines. Since its introduction in 2021, Amazon SageMaker Canvas has enabled business analysts to build, deploy, […]
( 8
min )
Researchers are taking deep learning for a deep dive, literally. The Woods Hole Oceanographic Institution (WHOI) Autonomous Robotics and Perception Laboratory (WARPLab) and MIT are developing a robot for studying coral reefs and their ecosystems. The WARPLab autonomous underwater vehicle (AUV), enabled by an NVIDIA Jetson Orin NX module, is an effort from the world’s Read article >
( 8
min )
The cloud is full of treats this GFN Thursday with Cities: Skylines II now streaming, leading 15 newly supported games this week. The game’s publisher, Paradox Interactive, is offering GeForce NOW one-month Priority memberships for those who pick up the game first, so make sure to grab one before they’re gone. Among the newly supported Read article >
( 7
min )
This research paper was presented at the 29th ACM Symposium on Operating Systems Principles (opens in new tab) (SOSP 2023), the premier forum for the theory and practice of computer systems software. For millennia, data has woven itself into every facet of our lives, from business and academia to personal spheres. Our production of data […]
The post Project Silica: Sustainable cloud archival storage in glass appeared first on Microsoft Research.
( 10
min )
Methane (CH4) is a major anthropogenic greenhouse gas that‘s a by-product of oil and gas extraction, coal mining, large-scale animal farming, and waste disposal, among other sources. The global warming potential of CH4 is 86 times that of CO2 and the Intergovernmental Panel on Climate Change (IPCC) estimates that methane is responsible for 30 percent of observed […]
( 12
min )
In this issue: Kosmos-2.5: A Multimodal Literate Model; Can vine copulas explain complex relationships of weather variables; New system accelerates the adaptive training process; Structural inequalities and relational labor in the influencer industry.
The post Research Focus: Week of October 23, 2023 appeared first on Microsoft Research.
( 10
min )
NVIDIA researchers are collaborating with academic centers worldwide to advance generative AI, robotics and the natural sciences — and more than a dozen of these projects will be shared at NeurIPS, one of the world’s top AI conferences. Set for Dec. 10-16 in New Orleans, NeurIPS brings together experts in generative AI, machine learning, computer Read article >
( 8
min )
In today’s information age, the vast volumes of data housed in countless documents present both a challenge and an opportunity for businesses. Traditional document processing methods often fall short in efficiency and accuracy, leaving room for innovation, cost-efficiency, and optimizations. Document processing has witnessed significant advancements with the advent of Intelligent Document Processing (IDP). With […]
( 20
min )
This post is co-authored by Dhurjati Brahma, Senior Systems Architect at T-Mobile US, Inc and Jim Chao, Principal Engineer/Architect at T-Mobile US, Inc and Nicholas Zellerhoff Associate Systems Architect at T-Mobile US, Inc. T-Mobile US, Inc. provides a Voicemail to Text service to its customers, which allows customers to quickly read through their voicemails and […]
( 7
min )
Written by Venkata Nori and Kshitij Gopali. Introduction As technology is evolving, most companies in the world are adopting advanced mechanisms for their daily tasks of storing/updating data, project management & tracking, incident management, version control, etc. Periodically, these companies’ business stakeholders would want to extract and analyze the data to see how the business… Read More »Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques
The post Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques appeared first on Data Science Central.
( 25
min )
A recent interview by Medical Device Network with GlobalData medical analyst Alexandra Murdoch shares interesting insights into cybersecurity for medical devices.
The post How data science and medical device cybersecurity cross paths to protect patients and enhance healthcare appeared first on Data Science Central.
( 22
min )
Visual effects artist Surfaced Studio returns to 'In the NVIDIA Studio' to share his real-world VFX project, created on a brand new Razer Blade 16 Mercury Edition laptop powered by GeForce RTX 4080 graphics.
( 8
min )
This post is co-authored by Anatoly Khomenko, Machine Learning Engineer, and Abdenour Bezzouh, Chief Technology Officer at Talent.com. Founded in 2011, Talent.com is one of the world’s largest sources of employment. The company combines paid job listings from their clients with public job listings into a single searchable platform. With over 30 million jobs listed […]
( 12
min )
Images such as those in Google Street View are taking on a new purpose in the hands of University of Florida Assistant Professor of Artificial Intelligence Chaofeng Wang. He’s using them, along with deep learning, in a research project to automate the evaluation of urban buildings. The project aims to help governments mitigate natural disaster Read article >
( 6
min )
The 15th Kendall Square Association annual meeting explored new and old aspects of the neighborhood.
( 9
min )
Pulsar timing arrays (PTAs) perform Bayesian posterior inference with
expensive MCMC methods. Given a dataset of ~10-100 pulsars and O(10^3) timing
residuals each, producing a posterior distribution for the stochastic
gravitational wave background (SGWB) can take days to a week. The computational
bottleneck arises because the likelihood evaluation required for MCMC is
extremely costly when considering the dimensionality of the search space.
Fortunately, generating simulated data is fast, so modern simulation-based
inference techniques can be brought to bear on the problem. In this paper, we
demonstrate how conditional normalizing flows trained on simulated data can be
used for extremely fast and accurate estimation of the SGWB posteriors,
reducing the sampling time from weeks to a matter of seconds.
( 2
min )
Randomized experimental comparisons of alternative pedagogical strategies
could provide useful empirical evidence in instructors' decision-making.
However, traditional experiments do not have a clear and simple pathway to
using data rapidly to try to increase the chances that students in an
experiment get the best conditions. Drawing inspiration from the use of machine
learning and experimentation in product development at leading technology
companies, we explore how adaptive experimentation might help in continuous
course improvement. In adaptive experiments, as different arms/conditions are
deployed to students, data is analyzed and used to change the experience for
future students. This can be done using machine learning algorithms to identify
which actions are more promising for improving student experience or outcomes.
This algorithm can then dynamically deploy the most effective conditions to
future students, resulting in better support for students' needs. We illustrate
the approach with a case study providing a side-by-side comparison of
traditional and adaptive experimentation of self-explanation prompts in online
homework problems in a CS1 course. This provides a first step in exploring the
future of how this methodology can be useful in bridging research and practice
in doing continuous improvement.
( 2
min )
Graph neural networks (GNNs) have gained significant popularity due to the
powerful capability to extract useful representations from graph data. As the
need for efficient GNN computation intensifies, a variety of programming
abstractions designed for optimizing GNN Aggregation have emerged to facilitate
acceleration. However, there is no comprehensive evaluation and analysis upon
existing abstractions, thus no clear consensus on which approach is better. In
this letter, we classify existing programming abstractions for GNN Aggregation
by the dimension of data organization and propagation method. By constructing
these abstractions on a state-of-the-art GNN library, we perform a thorough and
detailed characterization study to compare their performance and efficiency,
and provide several insights on future GNN acceleration based on our analysis.
( 2
min )
The performance of neural networks has been significantly improved by
increasing the number of channels in convolutional layers. However, this
increase in performance comes with a higher computational cost, resulting in
numerous studies focused on reducing it. One promising approach to address this
issue is group convolution, which effectively reduces the computational cost by
grouping channels. However, to the best of our knowledge, there has been no
theoretical analysis on how well the group convolution approximates the
standard convolution. In this paper, we mathematically analyze the
approximation of the group convolution to the standard convolution with respect
to the number of groups. Furthermore, we propose a novel variant of the group
convolution called balanced group convolution, which shows a higher
approximation with a small additional computational cost. We provide
experimental results that validate our theoretical findings and demonstrate the
superior performance of the balanced group convolution over other variants of
group convolution.
( 2
min )
Molecular language modeling is an effective approach to generating novel
chemical structures. However, these models do not \emph{a priori} encode
certain preferences a chemist may desire. We investigate the use of fine-tuning
using Direct Preference Optimization to better align generated molecules with
chemist preferences. Our findings suggest that this approach is simple,
efficient, and highly effective.
( 2
min )
On-device machine learning (ML) enables the training process to exploit a
massive amount of user-generated private data samples. To enjoy this benefit,
inter-device communication overhead should be minimized. With this end, we
propose federated distillation (FD), a distributed model training algorithm
whose communication payload size is much smaller than a benchmark scheme,
federated learning (FL), particularly when the model size is large. Moreover,
user-generated data samples are likely to become non-IID across devices, which
commonly degrades the performance compared to the case with an IID dataset. To
cope with this, we propose federated augmentation (FAug), where each device
collectively trains a generative model, and thereby augments its local data
towards yielding an IID dataset. Empirical studies demonstrate that FD with
FAug yields around 26x less communication overhead while achieving 95-98% test
accuracy compared to FL.
( 2
min )
This work introduces the first toolkit around path-norms that is fully able
to encompass general DAG ReLU networks with biases, skip connections and any
operation based on the extraction of order statistics: max pooling, GroupSort
etc. This toolkit notably allows us to establish generalization bounds for
modern neural networks that are not only the most widely applicable path-norm
based ones, but also recover or beat the sharpest known bounds of this type.
These extended path-norms further enjoy the usual benefits of path-norms: ease
of computation, invariance under the symmetries of the network, and improved
sharpness on feedforward networks compared to the product of operators' norms,
another complexity measure most commonly used.
The versatility of the toolkit and its ease of implementation allow us to
challenge the concrete promises of path-norm-based generalization bounds, by
numerically evaluating the sharpest known bounds for ResNets on ImageNet.
( 2
min )
A metric tensor for Riemann manifold Monte Carlo particularly suited for
non-linear Bayesian hierarchical models is proposed. The metric tensor is built
from symmetric positive semidefinite log-density gradient covariance (LGC)
matrices, which are also proposed and further explored here. The LGCs
generalize the Fisher information matrix by measuring the joint information
content and dependence structure of both a random variable and the parameters
of said variable. Consequently, positive definite Fisher/LGC-based metric
tensors may be constructed not only from the observation likelihoods as is
current practice, but also from arbitrarily complicated non-linear prior/latent
variable structures, provided the LGC may be derived for each conditional
distribution used to construct said structures. The proposed methodology is
highly automatic and allows for exploitation of any sparsity associated with
the model in question. When implemented in conjunction with a Riemann manifold
variant of the recently proposed numerical generalized randomized Hamiltonian
Monte Carlo processes, the proposed methodology is highly competitive, in
particular for the more challenging target distributions associated with
Bayesian hierarchical models.
( 2
min )
Inference on modern Bayesian Neural Networks (BNNs) often relies on a
variational inference treatment, imposing violated assumptions of independence
and the form of the posterior. Traditional MCMC approaches avoid these
assumptions at the cost of increased computation due to its incompatibility to
subsampling of the likelihood. New Piecewise Deterministic Markov Process
(PDMP) samplers permit subsampling, though introduce a model specific
inhomogenous Poisson Process (IPPs) which is difficult to sample from. This
work introduces a new generic and adaptive thinning scheme for sampling from
these IPPs, and demonstrates how this approach can accelerate the application
of PDMPs for inference in BNNs. Experimentation illustrates how inference with
these methods is computationally feasible, can improve predictive accuracy,
MCMC mixing performance, and provide informative uncertainty measurements when
compared against other approximate inference schemes.
( 2
min )
Compressing neural networks is a key step when deploying models for real-time
or embedded applications. Factorizing the model's matrices using low-rank
approximations is a promising method for achieving compression. While it is
possible to set the rank before training, this approach is neither flexible nor
optimal. In this work, we propose a post-training rank-selection method called
Rank-Tuning that selects a different rank for each matrix. Used in combination
with training adaptations, our method achieves high compression rates with no
or little performance degradation. Our numerical experiments on signal
processing tasks show that we can compress recurrent neural networks up to 14x
with at most 1.4% relative performance reduction.
( 2
min )
We study the performance of empirical risk minimization on the $p$-norm
linear regression problem for $p \in (1, \infty)$. We show that, in the
realizable case, under no moment assumptions, and up to a
distribution-dependent constant, $O(d)$ samples are enough to exactly recover
the target. Otherwise, for $p \in [2, \infty)$, and under weak moment
assumptions on the target and the covariates, we prove a high probability
excess risk bound on the empirical risk minimizer whose leading term matches,
up to a constant that depends only on $p$, the asymptotically exact rate. We
extend this result to the case $p \in (1, 2)$ under mild assumptions that
guarantee the existence of the Hessian of the risk at its minimizer.
( 2
min )
We initiate a novel approach to explain the out of sample performance of
random forest (RF) models by exploiting the fact that any RF can be formulated
as an adaptive weighted K nearest-neighbors model. Specifically, we use the
proximity between points in the feature space learned by the RF to re-write
random forest predictions exactly as a weighted average of the target labels of
training data points. This linearity facilitates a local notion of
explainability of RF predictions that generates attributions for any model
prediction across observations in the training set, and thereby complements
established methods like SHAP, which instead generates attributions for a model
prediction across dimensions of the feature space. We demonstrate this approach
in the context of a bond pricing model trained on US corporate bond trades, and
compare our approach to various existing approaches to model explainability.
( 2
min )
Molecular language modeling is an effective approach to generating novel
chemical structures. However, these models do not \emph{a priori} encode
certain preferences a chemist may desire. We investigate the use of fine-tuning
using Direct Preference Optimization to better align generated molecules with
chemist preferences. Our findings suggest that this approach is simple,
efficient, and highly effective.
( 2
min )
Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries. However, implementing security, data privacy, and governance controls are still key challenges faced by customers when implementing ML […]
( 16
min )
This is a guest post co-written by Rama Badrinath, Divay Jindal and Utkarsh Agrawal at Meesho. Meesho is India’s fastest growing ecommerce company with a mission to democratize internet commerce for everyone and make it accessible to the next billion users of India. Meesho was founded in 2015 and today focuses on buyers and sellers […]
( 6
min )
GPU-powered surgical-simulation devices are helping train more than 2,000 doctors a year in lower-income countries to treat cataract blindness, the world’s leading cause of blindness, thanks to the nonprofit HelpMeSee. While cataract surgery has a success rate of around 99%, many patients in low- and middle-income countries lack access to the common procedure due to Read article >
( 6
min )
A new AI agent developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks — for the first time as well as a human can. The stunning prestidigitation, showcased in the video above, is one of nearly 30 tasks that robots have learned to expertly Read article >
( 6
min )
Companies increasingly rely on user-generated images and videos for engagement. From ecommerce platforms encouraging customers to share product images to social media companies promoting user-generated videos and images, using user content for engagement is a powerful strategy. However, it can be challenging to ensure that this user-generated content is consistent with your policies and fosters […]
( 7
min )
High-resolution imagery is very prevalent in today’s world, from satellite imagery to drones and DLSR cameras. From this imagery, we can capture damage due to natural disasters, anomalies in manufacturing equipment, or very small defects such as defects on printed circuit boards (PCBs) or semiconductors. Building anomaly detection models using high-resolution imagery can be challenging […]
( 8
min )
Customers increasingly want to use deep learning approaches such as large language models (LLMs) to automate the extraction of data and insights. For many industries, data that is useful for machine learning (ML) may contain personally identifiable information (PII). To ensure customer privacy and maintain regulatory compliance while training, fine-tuning, and using deep learning models, […]
( 12
min )
To enable professionals worldwide to build and run AI applications right from their desktops, NVIDIA and AMD are powering a new line of workstations equipped with NVIDIA RTX Ada Generation GPUs and AMD Ryzen Threadripper PRO 7000 WX-Series CPUs. Bringing together the highest levels of AI computing, rendering and simulation capabilities, these new platforms enable Read article >
( 5
min )
Training generative AI models just got easier. NVIDIA DGX Cloud AI supercomputing platform and NVIDIA AI Enterprise software are now available in Oracle Cloud Marketplace, making it possible for Oracle Cloud Infrastructure customers to access high-performance accelerated computing and software to run secure, stable and supported production AI in just a few clicks. The addition Read article >
( 6
min )
Rush to the cloud — stream Counter-Strike 2 on GeForce NOW for the highest frame rates. Members can play through the newest chapter of Valve’s elite, competitive, first-person shooter from the cloud. It’s all part of an action-packed GFN Thursday, with 22 more games joining the cloud gaming platform’s library, including Hot Wheels Unleashed 2 Read article >
( 5
min )
We developed a safety mitigation stack to ready DALL·E 3 for wider release and are sharing updates on our provenance research.
( 3
min )
AI models that prioritize similarity falter when asked to design something completely new.
( 10
min )
The award honors research on public policy with a focus on economic and governmental reforms.
( 7
min )
Purina US, a subsidiary of Nestlé, has a long history of enabling people to more easily adopt pets through Petfinder, a digital marketplace of over 11,000 animal shelters and rescue groups across the US, Canada, and Mexico. As the leading pet adoption platform, Petfinder has helped millions of pets find their forever homes. Purina consistently […]
( 9
min )
This position research paper was presented at the 26th ACM Conference on Computer-Supported Cooperative Work and Social Computing (opens in new tab) (CSCW 2023), a premier venue for research on the design and use of technologies that affect groups, organizations, and communities. In the business world, measuring success is as critical as selecting the right […]
The post Understanding the user: How the Enterprise System Usability Scale aligns with user reality appeared first on Microsoft Research.
( 10
min )
Powerful generative AI models and cloud-native APIs and microservices are coming to the edge. Generative AI is bringing the power of transformer models and large language models to virtually every industry. That reach now includes areas that touch edge, robotics and logistics systems: defect detection, real-time asset tracking, autonomous planning and navigation, human-robot interactions and Read article >
( 8
min )
Artificial intelligence is now a household term. Responsible AI is hot on its heels. Julia Stoyanovich, associate professor of computer science and engineering at NYU and director of the university’s Center for Responsible AI, wants to make the terms “AI” and “responsible AI” synonymous. In the latest episode of the NVIDIA AI Podcast, host Noah Read article >
( 6
min )
Real-time rendering, animation and texture baking are essential workflows for 3D art production. Using the Marmoset Toolbag software, 3D artists can enhance their creative workflows and build complex 3D models without disruptions to productivity.
( 7
min )
NVIDIA founder and CEO Jensen Huang joined Hon Hai (Foxconn) Chairman and CEO Young Liu to unveil the latest in their ongoing partnership to develop the next wave of intelligent electric vehicle (EV) platforms for the global automotive market. This latest move, announced today at the fourth annual Hon Hai Tech Day in Taiwan, will Read article >
( 6
min )
Amazon Pharmacy is a full-service pharmacy on Amazon.com that offers transparent pricing, clinical and customer support, and free delivery right to your door. Customer care agents play a crucial role in quickly and accurately retrieving information related to pharmacy information, including prescription clarifications and transfer status, order and dispensing details, and patient profile information, in […]
( 8
min )
At Amazon Web Services (AWS), not only are we passionate about providing customers with a variety of comprehensive technical solutions, but we’re also keen on deeply understanding our customers’ business processes. We adopt a third-party perspective and objective judgment to help customers sort out their value propositions, collect pain points, propose appropriate solutions, and create […]
( 16
min )
Amazon Personalize has launched a new integration with Amazon OpenSearch Service that enables you to personalize search results for each user and assists in predicting their search needs. The Amazon Personalize Search Ranking plugin within OpenSearch Service allows you to improve the end-user engagement and conversion from your website and app search by taking advantage […]
( 7
min )
GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations.
( 7
min )
NVIDIA today announced an update to RTX Video Super Resolution (VSR) that delivers greater overall graphical fidelity with preserved details, upscaling for native videos and support for GeForce RTX 20 Series GPUs.
( 7
min )
Researchers coaxed a family of generative AI models to work together to solve multistep robot manipulation problems.
( 11
min )
Some researchers see formal specifications as a way for autonomous systems to "explain themselves" to humans. But a new study finds that we aren't understanding.
( 9
min )
Veriff is an identity verification platform partner for innovative growth-driven organizations, including pioneers in financial services, FinTech, crypto, gaming, mobility, and online marketplaces. In this post, we show you how Veriff standardized their model deployment workflow using Amazon SageMaker, reducing costs and development time.
( 8
min )
How trustworthy are generative pre-trained transformer (GPT) models? To answer this question, University of Illinois Urbana-Champaign, together with Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research, released a comprehensive trustworthiness evaluation platform for large language models (LLMs), which is presented in the recent paper: DecodingTrust: A Comprehensive Assessment of Trustworthiness […]
The post DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models appeared first on Microsoft Research.
( 11
min )
Similar to my article series on adversarial robustness, I was planning to have a series on bit errors robustness accompanied by PyTorch code. Instead, due to time constraints, I decided to condense the information into a single article. The code for the originally planned six articles is available on GitHub.
The post Benchmarking Bit Errors in Quantized Neural Networks with PyTorch appeared first on David Stutz.
( 6
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )